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PREFACE 


This book is intended primarily as a text for a one- or two- 
semester course for students who have had little or no previous 
calculus or statistics. It has two main purposes: 

1. To develop some basic mathematico-logical concepts of 
statistics, particularly the logic of statistical inference. 

2. To develop an understanding of the language used in 
mathematical statistics, including elementary calculus. It is 
assumed that understanding must include some facility in read¬ 
ing compact mathematical expressions and in applying mathe¬ 
matical theorems to empirical problems without extended 
explanations of what the theorems mean. 

This approach is based on the premise that the college student, 
whether oriented toward applications or toward mathematics, 
can best spend his time and energy in a first course in statistics 
in mastering some of the abstract concepts, i.e., mathematical 
models, and some of the mathematical language of the field. 
Because of the stress on abstract definitions and mathematical 
models, applications have for the most part been subordinated; 
in many cases examples and exercises have deliberately been 
made trivial so as not to detract from the abstract structure. 
All data in this text are hypothetical, though many of them are 
characteristic of those actually obtained. Ideally, genuine data 
on genuine problems should be presented, and I admire the 
thoroughness with which some authors have done this, but on 
the other hand genuine data involve technicalities of special 
fields, especially since, to do justice to the data and to avoid 
overgeneralizations, the specific conditions under which actual 
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data were obtained should be presented. It was thought 
worthwhile to avoid these technicalities; however, many of the 
types of problems to which statistical applications have been 
made are clearly indicated. 

The sequence of the book is as follows: An attempt is made 
to get the student to build up a precise understanding of what 
is meant by “expected sampling distribution of a statistic” 
(later the word “expected” is dropped) at as early a stage as 
possible. It was believed that the best way to accomplish this 
was first to introduce combinatorial theorems and exercises, so 
that the student would be able to work out some simple sampling 
distributions for himself. The third chapter, on statistical 
inference, which is the most important one in the book, presents 
the concepts of expected sampling distribution, testing of 
hypotheses, confidence intervals, and power of a test without 
introducing other than finite populations; it seemed to the author 
that the student can best understand these concepts without 
having to be concerned with infinity, continuity, density, or the 
normal curve. In this feature the book was unique at the time 
of writing, but since that time another text incorporating essen¬ 
tially this same feature has been published. 

Definitions which are often given only implicitly (e.g., popula¬ 
tion, distribution, discrete) are made explicit, and for this pur¬ 
pose I have utilized a common concept of mathematical logic, 
that of a class of ordered pairs. The introduction of this con¬ 
cept may appear slightly pedantic, but it seemed justified by 
its simplicity and precision. 

No attempt is made to give the student an understanding of 
the problems involved in a rigorous empirical definition of 
“random sample”; errors in application do not lie at this 
level. 

Not until the student has already been introduced to most 
of the concepts peculiar to statistics is the topic of continuous 
distributions introduced. At this point I have attempted to 
present the fundamental concepts of the calculus—limit, deriva¬ 
tive, and integral. It is admittedly radical to take the position 
that these concepts can be learned by the student as an inci¬ 
dental part of a course in statistics; however, most teachers who 
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are shocked at the idea of requiring their students to learn a 
little of the calculus seem to overlook the fact that most courses 
in calculus spend most of the course on techniques of differentia¬ 
tion and integration and on particular kinds of applications 
rather than on the fundamental concepts. Any student who 
has the intelligence necessary to understand the subtle con¬ 
cepts of statistical inference can certainly understand those of 
elementary calculus. 

With the concepts and language of elementary calculus avail¬ 
able it is then easy to present discussions of the normal, t } and 
X 2 distributions. Regression and correlation are discussed in 
the context of bivariate distributions. Analysis of variance is 
introduced, but no attempt is made to go into the many com¬ 
plexities of this topic. The concluding chapter is a short one 
on nonparametric statistics. 

Beginning with the chapter on normal distributions, it is 
necessary to state many theorems without proving them. I 
must apologize for the rather tiresome repetition of “we state 
without proof,” but I wanted the student to be reasonably 
clear about what he is expected to take on faith. For those 
with a background in calculus many proofs are presented in the 
Appendix; for the convenience of teachers other proofs, given 
usually by Cramer, Hoel, Mood, or Wilks are referred to in 
footnotes. 

The book becomes increasingly compact in the language used 
and often leaves out details which should be filled in by the 
student, who will find it essential to do a considerable number of 
the exercises as he reads the book. 

In the last chapter there are almost no examples, and this 
chapter can perhaps be used as one test of whether the student 
has learned to understand simple mathematical language. 

A brief development of elementary calculus and an introduc¬ 
tion to mathematical statistics precedes the proofs in the 
Appendix. I doubt that those having no previous calculus will 
be able to follow the subsequent proofs on the basis of the 
Appendix alone, but this development may be helpful as a 
review and it may be useful to teachers with heterogeneous 
classes who want to supplement the body of the text. In 
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particular, it may be desirable or even necessary to supplement 
the introduction to calculus given so briefly in Chap. 8. 

It is with considerable humility that the author, who is not a 
mathematician, offers a text that has as its purpose the presenta¬ 
tion of some basic mathematical concepts; it is hoped that it 
will be found reasonably adequate both by those nonmathe¬ 
maticians for whom it was primarily written and also by mathe¬ 
maticians interested in how well those outside their field under¬ 
stand it. 

There are many to whom I am indebted. Among these are 
D. A. Grant, with whom I first studied statistics and who 
aroused my interest in the subject, Harold Gulliksen, and S. S. 
Wilks, with whom I did further study. Not being a mathe¬ 
matician, I have drawn considerably from other authors, 
particularly S. S. Wilks, Harald Cram6r, A. M. Mood, and P. G. 
Hoel, and, in the calculus, Richard Courant. I hope that at no 
point I have stepped over the boundary of plagiarism, but in 
mathematics it is difficult to know just where that boundary is. 
Professor Wilks was kind enough to look over part of my manu¬ 
script and to make some helpful comments, pro and con my 
point of view. I am also indebted to Harold Freeman, who 
made numerous helpful criticisms and suggestions. 

Grateful acknowledgment is made also to Catherine Thomp¬ 
son and Maxine Merrington, and to E. S. Pearson, editor of 
Biometrika , for permission to include Tables D.3 and D.5, which 
are abridged versions of tables published in Biometrika. I am 
indebted to Professor Sir Ronald A. Fisher, Cambridge, to 
Dr. Frank Yates, Rothamsted, and to Messrs. Oliver & Boyd, 
Ltd., Edinburgh, for permission to reprint Table D.4, which is 
an abridgment from their book, Statistical tables for use in 
biological , agricultural , and medical research; to A. M. Mood, 
for permission to reprint from his book, Introduction to the theory 
of statistics , the tables referred to above, which Professor Mood 
abridged; to W. J. Dixon and F. J. Massey, for permission to 
reprint Fig. 11.1.1 and Tables D.3, D.6, and D.7 from their 
book, Introduction to statistical analysis; to Frieda S. Swed and 
C. Eisenhart and to the editor of the Annals of Mathematical 
Statistics for permission to reprint Table D.7, which first 
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appeared in the Annals; to J. P. Guilford, for permission to 
reprint Tables C.2, C.4, and C.5 from his book, Fundamental 
statistics in psychology and education; to Professor Sir Ronald A. 
Fisher and Messrs. Oliver & Boyd, Ltd., Edinburgh, for permis¬ 
sion to reprint Table C.5, derived (by Guilford) from Statistical 
methods for research workers; to H. Sorenson for permission to 
reprint Table C.l from Statistics for students of psychology and 
education; and to Frank Wilcoxon and the American Cyanamid 
Company for permission to reprint Table D.8 from Some rapid 
approximate statistical procedures . 

My greatest debt is to my wife, Pauline Austin Adams, with¬ 
out whose tremendous aid and encouragement this manuscript 
might never have been completed. 


Joe Kennedy Adams 








CONTENTS 


Preface .vii 

Chapter 1. Finite Populations and Their Distributions .... 1 

1.1. Problems dealt with by statistics. 1 

1.2. Sets and values. 2 

1.3. Populations. 3 

1.4. Frequency functions of finite populations. 5 

1.5. Cumulative distribution functions of finite populations .... 6 

1.6. Graphic methods of showing distributions. 7 

1.7. Some simple problems involving distributions. 7 

1.8. Some combinatorial considerations.11 

Chapter 2. Sampling from a Finite Population.17 

2.1. Expected sampling distributions.17 

2.2. Random sampling.18 

2.3. Probability. 21 

2.4. The calculus of probabilities.25 

Chapter 3. Statistical Inference.30 

3.1. Expected sampling distributions yielded by different hypotheses . 30 

3.2. Testing hypotheses about the population..... 32 

3.3. Confidence intervals.36 

3.4. Power of a test.37 

3.5. Increasing the power of a test by increasing the number of obser¬ 
vations .38 

3.6. A word of caution.41 

3.7. The logic of statistical inference.41 

3.8. Two types of error.45 

3.9. Confidence intervals with one bound only.47 

^JIShapter 4. Parameters and Statistics.50 

4.1. Definitions.50 

4.2. Measures of central tendency.50 

4.3. Measures of variability.§2 

4.4. Abbreviations.55 

4.5. Moments.57 

xiii 






























XIV 


CONTENTS 


4.6. Computation of moments.57 

4.7. Transformations of values for computational purposes .... 59 

4.8. Grouping of data into class intervals.61 

V Chapter 5. Hypergeometric and Binomial Distributions .... 64 

5.1. Hypergeometric distributions.64 

5.2. Moments of a hypergeometric distribution.64 

5.3. Binomial distributions.65 

5.4. Moments of a binomial distribution.67 

5.5. Binomial as limit of hypergeometric. 70 

5.6. Fitting a binomial to sample data.73 

\ Chapter 6. Poisson Distributions.75 

6.1. A Poisson as an approximation to a binomial.75 

6.2. Computation of a Poisson.78 

6.3. Definition.78 

6.4. Moments of a Poisson distribution.78 

6.5. Poisson distributions as exact.80 

/ 

\/Chapter 7. Discrete Distributions.82 

7.1. Definition.82 

7.2. The logic of statistical inference for discrete distributions . . . 83 

V Chapter 8. Continuous Distributions.84 

8.1. Continuous populations.84 

8.2. Proportion density.84 

8.3. Continuous distributions.88 

8.4. Definite integrals.90 

8.5. Indefinite integrals.92 

8.6. A physical model for distributions.94 

8.7. Moments of continuous distributions.94 

Chapter 9. Normal Distributions. "... 96 

9.1. A normal distribution as an approximation to a binomial distribution 96 

9.2. Definition of normal distributions.98 

9.3. Moments of normal distributions.J-00 

9.4. The special case c (= jm) =0 and k ( = &) = 1 .102 

9.5. Tests of hypotheses and confidence intervals.105 

9.6. The central-limit theorem.110 

9.7. Confidence interval for a mean of a nonnormal population . . 113 

9.8. Difference between two independent normally distributed variates . 116 

9.9. Fitting a normal distribution to a sample.121 

Chapter 10. Chi Square.123 

10.1. Definition of chi square.123 

10.2. Goodness of fit when the hypothetical distribution is completely 

specified. 124 

10.3. Goodness of fit when the hypothetical distribution is incompletely 

specified.180 






























CONTENTS xv 

10.4. Test of independence in a contingency table.133 

10.5. Tests of homogeneity.138 

10.6. Test for variance of a normal population.141 

Chapter 11. “Student’s” t Distributions.143 

11.1. Definition of t n and the distribution of t n .143 

11.2. Testing hypotheses about population means and finding confidence 

intervals with small samples from normal populations with unknown 
variances.145 

11.3. A criterion for the discarding of exceptional observations and the 
testing of a difference between the mean of a subsample and the mean 

of the sample.150 

Chapter 12. Bivariate Distributions.. 152 

12.1. Definition and properties of bivariate distributions .... . 152 

12.2. Regression.157 

12.3. Linear regression.162 

12.4. The correlation ratio.168 

12.5. The sampling distribution of r .170 

12.6. The scatter plot.173 

12.7. Computation of r .175 

Chapter 13. F Distributions and the Analysis of Variance . . . 180 

13.1. Definition of F .180 

13.2. Distribution of F .180 

13.3. The ratio of unbiased estimates of two population variances. . . 181 

13.4. The special case <j\ — cr|.182 

13.5. The special case mi = M 2 ; erf = u\ .183 

13.6. The case of several groups.185 

13.7. The partition of a sample into k groups.191 

13.8. The double partition of a sample.194 

13.9. Computation of sums of squares.201 

Chapter 14. Nonparametric Statistics.204 

14.1. Definition.204 

14.2. The sign test.204 

14.3. The run test.205 

14.4. Tolerance limits.206 

14.5. Order statistics.207 

14.6. Confidence intervals for percentile points.208 

14.7. Wilcoxon’s matched-pairs signed-ranks test.209 

References. 213 

Appendix A. Some Hints on How to Ask Questions of Mathematical 

Statisticians.217 

Appendix B. Mathematical Appendix.219 

B.l. Limit of a sequence.219 

B,2, Limit of a series ..220 



































CONTENTS 


xvi 

B.3. Continuous functions.222 

B.4. The definite integral.223 

B.5. The derivative.224 

B.6. Primitive functions.230 

B.7. Indefinite integrals.231 

B.8. Fundamental theorem of the calculus.231 

B. 9. Distribution functions (continuous case).232 

B.10. Mathematical expectation.233 

B.ll. Power series.234 

B.12. Moment-generating functions.235 

B.13. Change of variable.236 

B.14. Multiple integration.237 

B.15. Joint distributions of random variables.240 

B.16. Expectation of sample moments about the origin.242 

B.17. Mean and variance of the sum of independent random variables . 243 

B.18. The law of large numbers.243 

B.19. Tchebysheff’s inequality.244 

B.20. Expectation of the sample variance.245 

B.21. Properties of normal distributions.245 

B.22. Properties of chi-square distributions.249 

B.23. The distribution of t n .250 

B.24. The distribution of F m , n .251 

B.25. The linear mean regression line.252 

B.26. Bivariate normal distributions.252 

Appendix C. Miscellaneous Tables .255 

C. l. Squares and square roots of numbers from 1 to 1,000 .... 256 

C.2. Four-place common logarithms of numbers (base 10) .... 269 

C.3. Natural logarithms (base e ).271 

C.4. Trigonometric functions .272 

C.5. Transformation of r to z (and p to £).273 

C.6. Derivatives.274 

C. 7. Primitive functions (indefinite integrals).275 

Appendix D. Tables of Sampling Distributions .277 

D. l. Ordinates of the normal density function with zero mean and unit 

variance. 278 

D.2. Cumulative normal distributions.279 

D.3. Cumulative chi-square distributions.280 

D.4. Cumulative t distributions.281 

D.5. Cumulative F distributions.282 

D.6. Critical values of r for sign test.284 

D.7. Confidence limits for number of runs.285 

D.8. Probabilities for Wilcoxon’s matched-pairs signed-ranks test . . 288 

Glossary .289 

Answers to Odd-numbered Exercises .290 

Index .299 































BASIC STATISTICAL CONCEPTS 




Chapter 1 

FINITE POPULATIONS AND 
THEIR DISTRIBUTIONS 


1.1. Problems Dealt with by Statistics. Statistics deals with 
the following kinds of problems: 

1. The Descriptive Problem. A set of observations or conceiv¬ 
able observations is often conceptually unwieldy; it is necessary 
to organize and condense such a set into an understandable and 
convenient form. This kind of problem ranges from very simple 
ones, such as that presented by a set of 1,000 intelligence-test 
scores from 1,000 different individuals under conditions as 
nearly constant as possible, to extremely complex problems, such 
as that presented by a set of 20 different measures on each of 
1,000 different subjects, in a situation in which we are interested 
in the interrelations among the 20 measures. 

2. The Problem of Inference. We may start with a set of well- 
defined conditions under which we can make observations; and 
try to infer what observations we should expect under these 
conditions, or, on the other hand, we may start with a set of 
observations and wish to infer what the conditions were which 
led to that set of observations. For example, we may start with 
a group of people comprised of 7,000 Republicans and 10,000 
non-Republicans and try to infer what to expect (approxi¬ 
mately, that is, within certain limits) if we should draw a group 
of 100 people at random from this total group of 17,000; or, con¬ 
versely, we may draw a group of 100 people at random from a 
total group of 17,000 of unknown composition, observe that 40 
are Republicans, and try to infer the total number of Republi¬ 
cans (approximately, that is, within certain limits) in the group 

1 




2 


BASIC STATISTICAL CONCEPTS 


of 17,000. These problems of inference, like the problems of 
description, range from a fairly simple level to a very complex 
level; however, even on the simple level the concepts involved 
are rather subtle and merit careful and prolonged study. 

It is with the second kind of problem that we shall be primarily 
concerned; however, the two kinds of problems are by no means 
entirely separate from each other. 

1.2. Sets and Values. Almost any object can be classified in 
many different ways. (The word “object” is used to refer not 
only to material objects but also to conceptual entities such as 
numbers and geometrical forms, and also to such entities as 
families, pairs of genes, specimens, bacterial colonies, insurance 
policies, perceptions, experimental procedures, sets or sequences 
of observations, etc.) For example, we can classify people 
according to age, sex, marital status, height, weight, race, color 
of eyes, score on a certain test, etc., and we can classify positive 
integers (“whole numbers”) according to whether they are even 
or odd, whether they are greater than 10, whether they are 
divisible by 3, etc. We shall call the one and only one category 
into which we classify a certain member of a set the value of that 
member, with the understanding, of course, that it is the value 
only with respect to a given classification scheme. For example, 
if we are using age as the classification scheme, the value of a 
certain person may be twenty-five years; if we are using color of 
eyes, the value of the same person may be blue. 

Definition. The value of a member of a set is the one and only 
one category into which that member is classified according to a 
given classification scheme. 

Examples. 1 . If people in a given group are classified according to their 
heights, the value of John Stewart Jones may be 6 ft 1 in.; if people in the same 
group are classified according to their names, the value of John Stewart Jones 
is his name, “John Stewart Jones.” 

2 . If positive integers in a given set are classified according to their sizes, 
the value of an integer is itself; if the integers are classified according to 
whether they are even or odd, the value of the number 3 is odd. 

It is necessary to distinguish carefully between a member of a 
set and its value. The set obtained by tossing a six-sided die and 
that obtained by spinning a pointer that can land on any 
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numeral from 1 through 6 cannot have the same members, 
though their members may have the same values. 1 

We shall use capital letters (with subscripts) to refer to mem¬ 
bers and small letters (with subscripts) to refer to values. For 
example, we may say, “Consider the set X 4 , X 2 , X 3 , X 4 , X 5 with 
values Xi, x 2 , and x 3 . Let the value of Xi be xi, the value of X 2 
be Xz, the value of X 3 be x % , the value of X 4 be x h and the value 
of Xi be x z .” 

A shorter way of saying the same thing is as follows: “Con¬ 
sider the set Xi, i = 1, 2, . . . , 5. Let value (X 4 ) = x x , 
value (X 2 ) = x 3 , value (X 3 ) = x 2 , value (X 4 ) = x u and value 
(X 6 ) = x 3 .” 

By letting x be an abbreviation for “value” we can be even 
more brief, as follows: “Consider the set X,, i = 1, 2, . . . , 5. 
Let x(Xi) = Xi (read ‘Let the value of X 4 equal x x ), x(X 2 ) = x 3 , 
x(X 3 ) = x 2 , x(Xi) = xi, and x(X 8 ) = x 3 .” 

1.3. Populations. When each member of a set has a value, 
the set is a population or universe. It is incorrect to call any set 
a “population” unless the principle of classification is specified, 
implicitly or explicitly; thus, it is incorrect to say ‘ ‘ a population 
of books” or “a population of integers,” but correct to say “a 
population of books classified according to author” or “a popu¬ 
lation of integers classified according to size.” 

It is convenient to think of each member as being paired with 
its value, the value being placed first in each pair. Thus a 
population of five people classified according to age could be the 
class of pairs (24, John Doe, Jr.), (23, Mary Doe), (47, John 
Doe), (46, Jane Doe), (1, John Doe III). These same people 
classified according to sex would be the different population 
(male, John Doe, Jr.), (female, Mary Doe), etc. 

The population described at the end of Sec. 1.2 is the class of 
pairs (xi,X 4 ), (x 3 ,X 2 ), (x 2 ,X 3 ), (x 4 ,X 4 ), (x 3 ,X 6 ). 

1 Strictly speaking, the numerals “l” and “6” should be enclosed in quotation 
marks in this sentence. In this text, however, numerals without quotes are used 
to refer either to numerals or to numbers, depending upon the context. Quotes 
have also been omitted (with some exceptions) when mentioning or defining various 
other symbols; this usage is consistent with that of most texts in mathematics, 
though not with those in mathematical logic, in which the use and mention of a 
symbol are carefully distinguished. 
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A class of pairs such that within each pair there is a definite 
order stipulated is called a class of ordered pairs . A class of 
ordered pairs such that if any two pairs have the same second 
member they also have the same first member is a function . As 
we stipulated that each member of a population (that is, the 
second member of a pair) has one and only one value (the first 
member of the pair), a population is a certain kind of function. 
Unfortunately the student has probably thought of a function 
as an expression such as/Or) = x 2 or y = x 2 , or as the rule given 
by these expressions. 1 Note, however, that these expressions 
merely tell us what number to pair with a given number; for 
example, with 1 we pair 1, with 2 we pair 4, with 3 we pair 9, etc. 
In other words, these expressions give us a class of pairs, of 

./ j/M V i(w it *■-'l 

which some members are (1,1), (4,2), and (9,3). The fact that 
we cannot list all the pairs is irrelevant at this point of our dis¬ 
cussion; the point is that neither expressions, rules, nor graphs 
are functions; it is the class of ordered pairs that is the function. 

Just as we would call the class of pairs given by f(x) = x 2 the 
square function, so we can call a population a value function. 

Definition. A population or universe is a value function, that 
is, a class of ordered pairs such that the second member of each 
pair is a member of a set and the first member of the pair is the 
value of that member of the set. 

Examples. 1. The population of 5 flips of a coin, each flip classified accord¬ 
ing to heads or tails as follows: (heads, first flip), (heads, second flip), (tails, 
third flip), (heads, fourth flip), (tails, fifth flip). 

1 Thus students are often bewildered when they encounter for the first time func¬ 
tions which cannot be defined by a simple equation, although such functions are 
extremely common in advanced mathematics. An example is: let f{x) = x if x is 
a member of the sequence 1, kh • • * , 1 /n f . . . ; let f(x) =0 otherwise. 
Students are sometimes also surprised to learn that any constant is a function of a 
variable, as it provides us with something—namely, itself—to pair with each 
value of the variable. Therefore, a statement that is true of functions in general 
is true of constants in particular. For example, if 

E[f(x) + ff(x) ] = E[f(x) ] + E[g(x)] 
where / and g are any functions whatsoever, then in particular 
E(x 2 - 4) = E(x 2 ) + j£(-4) 

as g(x) in this case is the constant —4. We leave E undefined at this point. 
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2 . The population of 1,000 students in a certain college, classified according 
to the numerical value of the IQ (as measured by the Stanford Binet): (118, 
Roberta Jones), (110, William Williams), etc. 


EXERCISES 

1 . 3 . 1 . Which of the following classes of ordered pairs could not be a popula¬ 
tion? 

a. (4,X 4 ), (1,X 2 ), (1,X 3 ), (2,X 4 ) 

b. (4,Xi), (1,X 2 ), (2,X 2 ), (5,X 3 ) 

c. (® 2 ,Xi), (# 4 ,X 2 ), (aq,X 3 ), (a? 2 ,X 4 ) 

d. (100,N.Q.), (100,P.Z.), (110,G.X.) 

e. (2,5), (2,6), (3,7) 

/. (2,5), (3,5), (4,6) 

1 . 3 . 2 . Which of the following rules could not give a population, the mem¬ 
bers in each case being the first 100 positive integers (whole numbers)? 

а. x(X) = X - 2 

б. x(X) = 0 if X is even, 1 if X is odd 

c. x(X) = ±VX 

d. x(X) = >7X 

e. x(X) = the positive square root of X 

/. x(X) = X if X is even, X — 3 if X is odd 

1 . 3 . 3 . Assuming that each of the following expressions refers to a popula¬ 
tion, distinguish between members and values in each case: 

a. Diameters of 10,000 screws 

b . Satisfactory and unsatisfactory performances in a training course 

c. Red and white blood cells in 1 ml of blood 

d . Durations of telephone conversations occurring between 11 a.m. and 
12 noon in a large city 

e. Several thousand radio tubes which have been inspected for defectives 

1.4. Frequency Functions of Finite Populations. Although 
each member of a population has one and only one value, it will 
not usually be the case that each value is assumed by only one 
member. The frequency function of a finite population is the 
way in which the members are distributed into the value cate¬ 
gories. A precise definition is the following. 

Definition. The frequency function (abbreviated fr.f.) of a 
finite population is the class of all ordered pairs such that the 
second member of each pair is a value and the first member is the 
(nonzero) proportion of the population having that value. 1 

1 The term “relative frequency function” might be more appropriate, but “fre- 

quency function ” is a standard term. 
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Examples. 1. The frequency function of 300 tosses of a six-sided die of 
which 46 are l’s, 51 are 2’s, 53 are 3’s, 47 are 4’s, 50 are 5’s, and 53 are 6’s is 
the class of pairs (*Hoo,l), ( 5 Hoo,2), ( 5 Hoo,3), ( 4 Koo,4), ( B Hoo,5), ( 5 Moo,6). 

2. The fr.f. of two tosses of a coin, both of which are heads, is the class 
(1, heads). 

3. The fr.f of the population described at the end of Sec. 1.2 is the class 

(H,Xt ), (%,x»). 

Just as we used x(X) to abbreviate “the value of X” we shall 
use f(x) as an abbreviation for “the proportion of the population 
having the value x.” Thus the fr.f. given in Example 1 can be 
given by writing /(1) = 4 % 00 ; /(2) - 5 ^ 0 o; /(3) = 5 Hoo; 
/(4) — 4 J4 ooj /(5) = 5 %oo 1 /(6) = 5 %oo~ Similarly, the fr.f. 
given in Example 2 is given by/(heads) = 1. 

1.5. Cumulative Distribution Functions of Finite Populations. 
When the values of a population can be arranged in an order, 
from lowest to highest, it is in many cases convenient to indicate 
for each value the proportion of the population having values 
less than or equal to that value. For example, the population 
whose fr.f. is given in Example 1 in Sec. 1.4 can be equally 
precisely described by the class of pairs ( 4 %oo,l), ( 9 %oo,2), 
( 15 %oo,3), ( 19 Jlsoo,4), ( 24 Koo,5), (1,6). In other words, 4 %oo of 
the population has values less than or equal to 1 (and since 1 is 
the lowest value, this means that 4 Hoo of the population has the 
value 1), 9 %oo ^ as values less than or equal to 2, 15 % 0 o has 
values less than or equal to 3, etc. Obviously the proportion of 
the population having values less than or equal to the highest 
value is always 1. 

Definition. The cumulative distribution function (abbreviated 
c.d.f., or simply F) of a finite population is the class of all 
ordered pairs such that the second member of each pair is a 
value and the first member is the proportion of the population 
having values less than or equal to the second member. 

Example. If a fr.f. is (V&AA), (HA), (H,2), (M,3), then the c.d.f. for the 
same population is (HAH), (HA), (Y&,2), (1,3). 

We shall use F(x) as an abbreviation for “the proportion of 
the population having values less than or equal to x.” Thus, in 
the above example, F(Ai) — ; F( 1) = %; etc. 



FINITE POPULATIONS AND THEIR DISTRIBUTIONS 7 

If the values of a finite population are qualitative categories 
like schizophrenic, manic-depressive, paranoiac, etc., we can of 
course arrange the values in an arbitrary order and thus obtain a 
c.d.f.; with an arbitrary order, however, the c.d.f. would not be 
meaningful. 

Cumulative distribution functions are sometimes called 
merely “distribution functions.” We shall use the term “dis¬ 
tribution,” however, to refer either to a fr.f. or to a c.d.f. Thus 
“the distribution of a population” means the way in which the 
members are distributed into the value categories, given either 
by the fr.f. or by the c.d.f. It should be noted that the fr.f. is a 
function of the c.d.f., that is, once the c.d.f. is specified the fr.f. 
is also implicitly specified; conversely, once the fr.f. is specified 
and an order assigned, either implicitly or explicitly, to the 
values, the c.d.f. is also implicitly specified. If the values of a 
population are such that aq < aq < aq < • • • < aq, then 

F(Xi) = /(* 1 ) + /(»*) + • • • + /(*<) 

f (Xi) = F(aq) - F(aq_i) 

1.6. Graphic Methods of Showing Distributions. Figure 
1.6.1 shows some commonly used graphic methods of describing 
frequency functions and cumulative distribution functions. 

1.7. Some Simple Problems Involving Distributions. Sup¬ 
pose we have two six-sided dice, one red and one blue. What is 
the distribution of the population comprised of all possible pairs 
of faces which can turn up if the two dice are tossed, each pair 
having as value the total number of dots appearing on the two 
dice? First let us list the possible pairs of faces, giving within 
each pair the number of dots on the red die first and the number 
of dots on the blue die second. 


(1,1) 

(1,2) 

■ • • d,6) 

(2,1) 

(2,2) 

• • • (2,6) 

(6,D 

(6,2) 

. . . (6,6) 
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Pie chart 

Fig. 1.6.1. Graphic methods of showing distributions. 

The population has 36 members. Since value (1,1) = 2, value 
(^>2) = 3, etc., the population itself is the class of pairs 


[2, (1,1)] 

[3, (1,2)] . . . 

[7, (1,6)] 

[3, (2,1)] 

[4, (2,2)] . . . 

[8, (2,6)] 

[7, (6,1)] 

[8, (6,2)] . . . 

[12, (6,6)] 


The fr.f. is therefore the class of pairs (^e,2), (%6>3), etc., given 
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more conveniently by the following table (also given is the c.d.f.): 


X 

fix) 

Fix) 

X 

fix) 

Fix) 

2 


Hq 

8 

He 

1 H 8 

3 


Viz 

9 

^6 

% 

4 


H 

10 

He 

1 2 

5 


Vl8 

11 

2 Ae 

3 %6 

6 

He 

Hz 

12 

He 

1 

7 

%6 

Vl2 





The student should note that, although there is only one com¬ 
bination of numerals which gives the value 3 and one combina¬ 
tion of numerals which gives the value 2, nevertheless /(3) is 
greater than /(2) because we can obtain a 2 and a 1 in two ways 
and a 1 and a 1 in only one way. Considerable care must be 
exercised in the enumeration of the ways in which an event can 
occur; otherwise errors will occur in working even the simplest 
problems. 

In our example with the two dice, suppose we take as value for 
each pair, not the total number of dots appearing but simply the 
presence of a 2 or a 6 or the absence of both, so that value 
(1,1) = absence; value (1,2) = presence; etc. Then the popu¬ 
lation is the class of pairs 


[absence, (1,1)] [presence, (1,2)] 
[presence, (1,2)] [presence, (2,2)] 


[presence, (1,6)] 
[presence, (2,6)] 


[presence, (6,1)] [presence, (6,2)] 


[presence, (6,6)] 


The fr.f. is the class of pairs (%, presence), (%, absence). 

Now suppose we paint three faces on the red die blue and 
three faces on the blue die red, and we take as the value of each 
pair simply the number of red faces turning up. Naming the 
six faces on each die by redi, red 2 , red 3 , bluei, blue 2 , blue 3 , the 
population becomes 
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[2, (redi, redi)] [2, (redi, red 2 )] 
[2, (red 2 , red x )] [2, (red 2 , red 2 )j 


[1, (redi, blue 8 )] 
[1, (red 2 , blues)] 


[1, (blue 3 , redi)] 


[1, (blue 3 , red 2 )] 


[0, (blue 3 , blue 3 )] 


The fr.f. is therefore (J4,0), (H>1)> (K>2). 


EXERCISES 


1 . 7 . 1 . We plan to toss a coin twice. What is the frequency function of 
the population of all possible results, taking as value the number of heads 
appearing? 

1 . 7 . 2 . We plan to toss a coin three times. What is the frequency function 
of the population of all possible results, taking as value the number of heads 
appearing? What is the cumulative distribution function? 

1 . 7 . 3 . What is the fr.f. of all possible pairs of letters from the word “sim¬ 
ple,” taking as value the number of s’ s? 

1 . 7 . 4 . What is the fr.f of all possible pairs of letters from the word “statis¬ 
tics,” taking as value the number of s’s? What is the c.d.f.? 

1 . 7 . 5 . What is the fr.f. of all possible pairs of letters from the word “ Missis¬ 
sippi,” taking as value the number of s’s? What is the c.d.f.? 

1 . 7 . 6 . Illustrate the answers to Exercises 1.7.3 to 1.7.5 with five different 
graphic methods. 

1 . 7 . 7 . Find the fr.f. and the c.d.f. (if one exists) for each of the following 
populations: 

a . (xi,X t ), (x h X 2 ), (x 2 ,Xz), (aq,X 4 ), (# 3 ,X 5 ) 

b. (1,X 4 ), (1,X 2 ), (-1,X S ), (1,X 4 ), (0,X 8 ), (1,X 6 ) 

c. (absence, Fi), (absence, F 2 ), (presence, F 3 ), (absence, F 4 ) 

d. (for, Jones), (against, Smith), (against, Doe), (for, Brown) 

1 . 7 . 8 . Find the fr.f. and the c.d.f. (if it exists) for each of the following 
populations: 

a. All possible different outcomes (in terms of gene pairs) of a mating of a 
male with gene pair (a, b) with a female with gene pair (a,6), where each of the 
progeny gets one gene from each parent. (The gene pairs are not ordered.) 

b. Same as a, except that parents have gene pairs (a,b) and (a,a). 

c. All possible mutual communication channels that can be formed among 
four people, each channel connecting two or more people and having as value 
the number of people it connects together. 

d. All possible communication networks that can be formed among four 
people, a network being defined as a particular arrangement of mutual com¬ 
munication channels and having as value the largest number of people mutually 
connected in that particular arrangement. In a given network there may be 
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people not connected with anyone else, but every network must include at 
least one channel. 

e. All possible routes of minimum length that a rat could take in finding its 
way from the starting point S to the goal G of the maze shown in the accom¬ 



panying diagram, taking the number of right turns minus the number of left 
turns as the value of each route. 

1.8. Some Combinatorial Considerations. The process of 
listing members of a finite population is usually rather tedious. 
Fortunately, the distribution of a finite population can often be 
determined by simple combinatorial considerations. Let us sup¬ 
pose that we have an urn which contains eight white and five 
black balls. What is the distribution of all possible pairs of 
balls, each pair having as value the number of white balls in the 
pair? There are three possible values, 0, 1, and 2. The pro¬ 
portion of pairs having a certain value is the number of pairs 
having that value divided by the number of pairs in all, that is, 


/( 0 ) 


number of ways of choosing 2 balls 

from 5 black balls _ 

number of ways of choosing 2 balls 
from 13 balls 


/(l) 


number of ways of choosing 1 ball from 
8 white and 1 ball from 5 black balls 
number of ways of choosing 2 balls from 
13 balls 


number of ways of choosing 2 balls from 

8 white balls _ 

number of ways of choosing 2 balls from 
13 balls 
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In order to prove a simple theorem giving the number of ways 
of choosing m objects from n objects, we need a more funda¬ 
mental theorem, which we shall now prove. 

Lemma. 1 If an event E% can occur in Wi ways and a sub¬ 
sequent event can occur (after E x has occurred) in w 2 ways, then 
events E x and E-> can occur in succession w x w 2 ways. 

Proof. For every one of the w% ways E x can occur there are 
w% ways E-i can occur. These ways can be enumerated by the 
accompanying table; for example, the pair (1, 2) means that the 
first way in which E x can occur can be paired with the second 
way in which E 2 can occur. 


(1,1) 

(1,2) 

• • • (l,w 2 ) 

(2,1) 

(2,2) 

• • • (2 ,w 2 ) 

(Wi,l) 

(»i, 2) 

■ . . (w h w 2 ) 


As the table has w x rows and w 2 columns, the total number of 
entries in the table is w x w 2 . Q.E.D. 2 

Our lemma can easily be generalized to any number of events. 
Consider, for example, three events, E 1} E 2 , and E 3 . Since the 
first two can occur in w x w 2 ways and since the third can occur in 
w 3 ways for every one of the w x w 2 ways in which the first two can 
occur, we can enumerate the ways in which the three events can 


occur by the accompanying table: 


(1,1) (1,2) . . . 

(l,w*) 

(2,1) (2,2) . . . 

(2,w») 

(WiW 2 ,l) (WiW 2 ,2) . . . 

(WiW 2 ,Wa) 


Since there are w x w 2 rows and w z columns, the total number of 
entries is w x w 2 w z . Similarly, we could prove the proposition for 
4, 5, 6, etc., events. In other words, if it is true for a certain 

1 A lemma is a proposition which is proved preliminary to a more important one. 

2 Q.E.D. is a commonly used abbreviation for “which was to be proved.” 
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number of events then it is also true for that number of events 
plus one additional event. We now state and prove our theorem 
rigorously. 

Theorem 1.8.1. If an event E i can occur in w i ways, and, 
after E i has occurred, a subsequent event E 2 can occur in w 2 
ways, and, after E i and E 2 have occurred, a subsequent event 
E 3 can occur in ways, . . . , and, after E i, E 2 , . . . , E n ~i 
have occurred, a final event E n can occur in w n ways, then all n 
events can occur in a sequence in w x w 2 • • • w^ways. 

Proof . By the lemma above, the theorem is true for n — 2. 
But if the theorem is true for any number of events n i? it must 
also be true for Ui + 1 events, because for every one of the 
WiW 2 • • • w ni ways that the first Ui events can occur in a 
sequence there are w ni+1 ways that event E ni+ 1 can occur, as 
shown by the accompanying table: 

(1.1) (1,2) . • : (l,ti>„ l+ i) 

(2.1) (2,2) . . . (2,ti> ni+1 ) 


(WiW 2 * * • W ni , 1) (u>iW 2 • • • W nif 2) . . . {WiW 2 • • • w ni ,w ni+1 ) 

The total number of entries in the table is wiw 2 w^ • * * 
w ni w ni +t (rows X columns). Therefore the theorem is true for 
n = 2, 3, ... . 

As a matter of fact, we did not need our lemma at all; it is 
sufficient to point out that the theorem is true for n = 1 and to 
prove that if it is true for ni (that is, a particular value of ri), it is 
true for n\ + 1. A proof of this kind is called mathematical 
induction . Many theorems which are very difficult (in some 
cases, perhaps, impossible) to prove by other methods are easily 
proved by mathematical induction. 

Theorem 1.8.2. The number of permutations (arrangements) 
of n objects is n(n — 1) • * • (1) (read “n factorial” and 
abbreviated nl or [n). 

Proof. We can fill the first position in n different ways; then, 
after filling the first position we can fill the second position in 
n ~ 1 different ways; . . . ; finally, we can fill the last position, 
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after filling all the rest, in only 1 way. Thus, by Theorem 1.7.1, 
the number of permutations is n factorial (ft!). Q.E.D. 

We can now obtain a simple expression for the number of ways 
of choosing m from n objects. 

Theorem 1.8.3. The number of ways of choosing m objects 
from n objects (abbreviated Cl) is equal to 

n(n — 1) * • • (ft — m + 1) 

m ! 

or, equivalently, m jy 

Proof. There are n ways of choosing the first object, n — 1 
ways of choosing the second object after choosing the first, n — 2 
ways of choosing the third object after choosing the first two, 
. . . , n — (tn — 1) — n — m 1 ways of choosing the mth 

object after choosing the first m — 1. Therefore, by Theorem 

1.8.1, the number of ways of choosing a sequence of m objects is 
n(n — 1) • • • (n — m + 1). But this number of ways of 
choosing a sequence is equal to the number of ways (C”) of 
choosing a combination of m objects times the number of ways 
of permuting m objects, which is ml; that is, 

C",m! = n(n — 1) • • • (n — m + 1) 

^ n(n - 1) • • • (n - m + 1) 

or C' m =- m! - 

Multiplying numerator and denominator by (n — m )!, we obtain 


C n 

rr 


n\ 


Q.E.D. 


m\(n — m)\ 

We can now return to the problem at the beginning of this 
section: What is the distribution of all possible pairs of balls 
which can be drawn from an urn containing 8 white and 5 black 
balls, each pair having as value the number of white balls in the 
pair? The answer is 


m 

/(i) 
/( 2 ) 


cvc 
ci 3 
c\c\ 

Cf 

ci 

Cf 
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Jcycp 2) he fr f ' iS the ClaSS ° f pairs 0) ( C * C */ C ?> 1) 

Note that we can describe the fr.f. by the rule 

^ 8/75 

/(*) = 

with the understanding that 0! = 1 (hence Cl = 1). The rule 
fix) = -%p-is merely a rule giving the frequency function, not 
the function itself, which is a class of ordered pairs. 


EXERCISES 

1 . 8 . 1 . What is the fr.f. of all possible triples from a set of 7 men and 4 

women, each triple having as value the number of sexes represented ? What is 
the c.d.f.? 

1 . 8 . 2 . What is the fr.f. of all possible triples that can be chosen from a set of 
7 men and 4 women, each triple having as value the number of women it con- 
tarns? What is the c.d.f.? 

, 1,8 ; 3 ; ^' hat is the fr ’ f - of a11 quadruples that it is possible to choose from a 
set of 5 white, 4 black, and 3 red cards, each quadruple having as value the 
number of colors present? What is the c.d.f.? 

^ llCrC are ^ lree r °utes from A to B and seven routes from B to C 
Of the seven routes from B to C, four have detours. What is the fr f of the 
population of all possible routes from A to C via B, taking as value the pres¬ 
ence or absence of a detour? 

• . Th f1 are 6 phyR . iRistR and 4 biologists on a 10-man committee. What 
is the fr.f. of the population of all possible subcommittees of 2 members each 
having as value the number of physicists it contains? What is the fr f ’ if we 

take as value simply whether or not both members of the committee are in 
the same science? 

1 . 8 . 6 . How many different black and white patterns can be made with 
4 black and 5 white cards, using all 9 cards each time in a row? (Hint ■ Find 
the number of ways each pattern can occur, that is, the number of permuta- 
tions that look the same.) 

1 . 8.7 Generalize your result in Exercise 1.8.6. to the number of discernibly 
different permutations of n objects of which n, look alike, n 2 look alike (though 
different from the first n x ), look alike (though different from the first 

* — 1 groups), so that n, + n 2 + • • • + n k = n. 

1 . 8 . 8 . What is the fr.f. of all possible results of tossing three ordinary dice 
taking as value the presence or absence of exactly two 6’s? 

' , 1 : 8 ' 9- What is the fr - f - of a11 possible results of tossing three ordinary dice 
taking as value the sum of the numbers represented on the three faces? 
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1.8.10. What is the fr.f. of all possible results of tossing 4 dice, taking as 

value the number of 6’s appearing? , , , 

1.8.11. What is the fr.f. of all possible triples of letters from the word 

“Mississippi,” taking as value the number of s’s? 

1.8.12. What is the fr.f. of all possible quadruples of letters from the word 

“statistics,” taking as value the number of s’s? 

1.8.13. What is the fr.f. of all possible ways m which a monkey can make a 
sequence of 30 choices in a learning experiment, at each choice being con¬ 
fronted with three alternatives, of which only one is correct (rewarded with 
food) ? Take the number of correct choices as the value of each possib e 

sequence. 


Chapter 2 

SAMPLING FROM A FINITE 
POPULATION 


2.1. Expected Sampling Distributions. As we have stated 
before, a population is a class of ordered pairs, each member of a 
set being paired with its value. A sample is one or more of these 
ordered pairs, that is, a subclass of the population. A sample 
may vary in size from 1 to the size of the entire population. 

Definition. A sample is a subclass of a population. 

Example. A sample of size 1 from the population (1,Z X ), (0,Z 2 ), (1,Z 3 ), 
(2,Z 4 ), (1,Z s ), (3,Z 6 ) is the pair (1,Z 3 ). Another sample of size 1 is (3,Z 6 )’. 
A sample of size 2 is (0,Z 2 ) and (3,Z 6 ). A sample of size 6 is the entire 
population. 

Definition. A statistic is a characteristic of a sample. 

Example. . In sampling the above population, we might take as a statistic 
the arithmetic mean, that is, the sum of the values in the sample divided by 
the size of the sample. The mean of the sample (1,Z 3 ) is 1; the mean of the 
sample (0,Z 2 ) and (3,Z«) is 1.5. The mean of the sample (l,Zi), (3,Z 6 ), and 
(1,Z 8 ) is 

Once we have decided upon a statistic, we can consider a 
population of all possible samples of a given size, each sample 
having as value the statistic. For example, the population of all 
possible samples of size 2 from the, population (1,X0, (4,X 2 ), 
(0,.Xg), and (0,X 4 ), each sample having as value the arithmetic 
mean, is the class of pairs [2.5, (Xi,X*)], [0.5, (Xi,X,)], [0.5, 
(Xi,X 4 )], [2, (X 2 ,X 8 )], [2, (X 2 ,X 4 )], and [0, (X 3 ,X 4 )]. The dis¬ 
tribution (fr.f) of this latter population is (}i,2), 

and (3-6,2.5) and is called the expected sampling distribution of the 

17 
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arithmetic mean for random samples of size 2 from the population 
(1,X0, (4,X 2 ), (0,X 3 ), and (0,X 4 ). 

Definition. The expected sampling distribution of a statistic 
z for random samples of size n from a finite population P is the 
distribution of the population of all possible samples of size n 
from the population P, each sample having as value the statis¬ 
tic z. 

Example. Suppose there are 11 red and 6 blue marbles in a bowl. What is 
the sampling distribution of the number of red marbles in samples of size 4 
drawn from this population? In this case the statistic is the number of red 
marbles in the sample of 4. There are C 4 7 possible samples; of these, C\ con¬ 
tain no red marbles, C^C\ contain 1 red marble, C-fCl contain 2 red marbles, 
CsV? contain 3, and CJ 1 contain 4. The expected sampling distribution is 
therefore (C 6 4 /Cf, 0), (C^Ct/CW 1), (C^CS/Cj 7 , 2), {CfCV&f, 3), (ClVC 1 * 7 ,4). 
This expected sampling distribution is conveniently given by the rule 

f(z) = Cl'CtJCl? 

2.2. Random Sampling. The term ‘‘ random sample ” is often 
used in books in statistics, and we shall use it also. Strictly 
speaking, however, a given sample can be no more random than 
any other sample; the term “random” refers to the method of 
drawing the sample rather than the sample itself. Usually two 
conditions are given as necessary and sufficient for random 
sampling: 

1. Each member of the population is just as likely to be 
included in the sample as any other member. 

2. The likelihood that any given member of the population 
will be included in the sample is not affected by the inclusion of 
any other particular member. In sampling a finite population 
this condition cannot literally be satisfied, that is, it must be 
modified slightly to read “is affected equally by” instead of “is 
not affected by.” 

It is possible to satisfy the first condition and not the second. 
For example, suppose that we have 4 chips in a bowl but that 2 of 
them are stuck together and the other 2 are also stuck together. 
We draw a sample of 2 members by drawing one of these pairs at 
random. Although each chip is equally likely to be included in 
the sample, the second condition is clearly not satisfied, as draw- 
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ing any one chip ensures that we shall draw that chip which is 
attached to it. 

It is also possible to satisfy the second condition and not the 
first. For example, suppose .that one of the 4 chips (which are 
now separated) is stuck to the bottom of the bowl, and that this 
fact precludes our drawing it in the sample of 2. Then, pro¬ 
vided we are as likely to draw any 2 of the remaining 3 as any 
other 2, we have satisfied the second condition but not the first. 

Nearly all statistical inference assumes random sampling. 1 
Unfortunately, however, the conditions for randomness often 
cannot be satisfied. For example, a psychologist may wish to 
draw a random sample of all human adults but may find that he 
has available only college sophomores taking elementary psy¬ 
chology at a certain college! In cases like this, the research 
worker usually tries to make some judgment about whether the 
bias introduced is at all relevant to the subject matter of his 
research, and if he is willing to assume, on the basis of this judg¬ 
ment, that his results will not be seriously biased, he goes ahead 
with the sample which he has. This procedure is often quite 
justifiable, as there are many problems in which one cannot see 
any reason for results’ being biased by lack of randomness in 
sampling. However, even when the first condition must be 
vio ated, the research worker should, if at all possible, avoid the 
violation of the second condition. For example, suppose a 
psychologist is interested in a certain kind of phenomenon, which 
can be produced under experimental conditions. He makes 
three observations on each of 10 subjects. Can he say that he 
has a random sample of 30 observations? Obviously not, as 
including an observation on a given subject in the sample auto¬ 
matically implies an inclusion of two additional observations on 
the same subject. Treating a sample of n observations as 
though it were random when actually it is made up of k different 
sets of observations, each internally related, is an error which 


t /-V 8 ,*,? 6 e J ei1 when the Population is first divided into two or more parts 
( stratified ) and then sampled proportionately from each stratum, because in 
stratified or representative” sampling it is usually assumed that within each 
stratum the members are drawn at random. It is possible to write down some 
sampling theory for biased sampling, but such theory is useful only if the amount of 
bias m sampling can be estimated, and this is ordinarily impossible. 
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accounts for a large proportion of invalid statistical inference, 
because, although it is obvious enough when simple statistics 
are used, it enters in a rather subtle manner when more com¬ 
plicated statistical manipulations are employed. 

EXERCISES 

2.2.1. Someone wishes to draw a random sample of 100 names from a book 
containing 20,000 short biographies in 2,150 pages. Which of the following 
procedures are random in the strict sense? If the procedure is not random, 

explain why. . . , , ,, ,, 

a. Putting each name inside a capsule, mixing these capsules thoroughly, 
drawing a capsule randomly, drawing another capsule randomly, etc., until 

100 capsules have been drawn. . 

b. Putting two names inside each capsule, mixing them thoroughly, drawing 
a capsule randomly, another capsule randomly, . . . , until 50 capsules have 

been drawn. > 

c. Putting each name inside a capsule, mixing them thoroughly, diawmg 
handfuls of capsules randomly until 100 capsules have been drawn. . 

d. Putting each combination of 100 names inside a capsule, mixing them 
thoroughly, and drawing one capsule at random. 

e. First choosing a page number by spinning a pointer (that can land on 
any numeral from 0 to 9) four times (letting 0372 give page 372, etc., and 
spinning the pointer again for any numeral that is “ impossible for example, 
if the first spin yields 3, the pointer will be spun again until a 0, 1, or 2 is 
obtained), then spinning the pointer twice to obtain the particular name on 
the page. The procedure is repeated 99 times. 

/ Following the same procedure as in e, except that the name so chosen and 
also’ the three following it (in the book) are taken, so that the procedure needs 

to be gone through only 25 times. t . 

g. Taking the first name on every page whose number is divisible by 21, 

until 100 names are obtained. 

h. Following the same procedure as in g, except choosing one of the first 
9 pages as a starting page by spinning a pointer that can land on any numeral 

from 1 to 9. . 

2.2.2. For a straw poll, names were selected at random from telephone 
directories. If the population to be sampled is the population of voters, what 
condition does this method violate? 

2.2.3. For a straw poll, a community is selected at random and then 50 peo¬ 
ple in the community are selected at random. Only registered voters are 
taken. Then another community is selected and an additional 50 voters are 
taken. This procedure is continued until a sample of 10,000 is obtained. 
Does this procedure violate one or both conditions of random sampling? 
Explain your answer. 
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2.2.4. Discuss each of the following sampling methods with respect to 
randomization: 

a. In an agricultural survey, a field worker is assigned the job of sampling 
the weights of pigs of a certain age range and breed. He takes a sample of 
about 100 pigs by choosing at random 30 farmers in the area and then choosing 
at random several of each farmer’s pigs meeting his specifications. 

b. A physiologist obtains a sample of blood specimens at a university by 
getting a student to donate blood and then getting him to send other students 
to him. Each donor is in turn asked to send others ; in this way 25 specimens 
are obtained. 

c. On a large cattle ranch each animal has its own number. A sample of 
30 is selected by writing down 30 numbers “at random” and then finding the 
corresponding animals. 

do In a study of working habits in a factory in which people work in pairs, a 
sample of 100 man-hours is obtained by studying for 1 hr each of 50 pairs of 
workers selected at random. The hour is selected at random from the work¬ 
ing day for each pair. 

e . A machine turns out a certain product at the rate of 100 units per hour 
and is operated for 8 hr per day. A sample of 48 units is taken each day by 
taking one at the beginning of each 10-min period during the day. 

/. In the case of the machine mentioned in e, a sample of 50 is taken by tak¬ 
ing 5 units at each of 10 randomly selected time intervals during the day. 

2.2.5. Find the expected sampling distribution of the arithmetic mean of 
random samples of size 2 from each of the following populations: 

a. (0,X 4 ), (0,X 2 ), (0,X 8 ), (0,X 4 ), (0,X 5 ) 

b. (-w, (0,X 2 ), (0,X 3 ), (1,X 4 ) 

c. (0,X]_), (0,X 2 ), (0,X 8 ), (1,X 4 ), (1,X 5 ) 

d. (0,Xx), (0,X 2 ), (1,X 8 ), (l,x 4 ), (2,X fi ) 

2.2.6. For each of the populations given in Exercise 2.2.5, find the expected 
sampling distribution of the arithmetic mean of random samples of size 3. 

2.2.7. Do the same as in Exercise 2.2.5 for random samples of size 4. 

2.2.8. Taking as a statistic the largest value in the sample (instead of the 
arithmetic mean), find the expected sampling distribution for random samples 
of size 2 from each of the populations given in Exercise 2.2.5. 

2.2.9. Do the same as in Exercise 2.2.8 for random samples of size 3. ^ 

2.2.10. Do the same as in Exercise 2.2.8 for random samples of size 4. 

2.2.11. Find the expected sampling distribution of the number of infected 
children in random samples of 5 from a population of 100 children of whom 70 
are infected. 

2.2.12. Find the expected sampling distribution of the number of news¬ 
paper readers in random samples of 10 from a population of 5,000 of whom 
3,000 are newspaper readers. 

2.3. Probability. Suppose that we draw a member at random 
from a finite population, that is, suppose we draw in such a 
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manner that we are as likely to draw one member as another. 
We shall define 1 the probability that the member which we draw 
will have a certain value x as the proportion of the population 
having that value, that is, as f(x). 

Definition. The probability of obtaining, in a random draw 
from a finite population, a member having the value x is the pro¬ 
portion of the population having that value, that is, f(x). The 
probability of obtaining a member having as value something 
not the value of any member is 0. Clearly 0 < prob < 1. 

Examples. 1. In a college with 2,400 undergraduates there are only 15 
majoring in mathematics. If we plan to select an undergraduate at random, 
the probability that he will be a mathematics major is 1 % 40 o = Heo “ .00625. 
If there are no speech majors, the probability that he will be a speech major 
is 0. If all students are majoring in something the probability that he will be 
majoring in something is 1. 

2. The probability of obtaining a 4 in a random toss of an ordinary six- 
sided die is (Here we are dealing with the population of six possible 
results.) 

3. If we throw an ordinary six-sided die twice, the probability of obtaining 
a sum of 8 is %§. (Here we are dealing with the population of 36 possible 
results; as we saw in Bee. 1.7, five of these results yield a sum of 8.) 

Note that according to our definition the probability of draw¬ 
ing at random a sample of size n that will have a statistic 
(defined, of course, before drawing the sample) having the value 
Zi is simply f(zi) in the expected sampling distribution of the 
statistic z for samples of size n. An expected sampling distribu¬ 
tion can thus also be called a probability distribution . 

EXERCISES 

2.3.1. Five books, each by a different author, are arranged at random on a 
shelf. What is the probability that, reading from left to right, they will be in 
alphabetical order according to the author? (Each author has a different 
name.) 

1 This is not by any means the only way of defining “ probability.’ ’ There are 
difficulties with any definition that has been proposed; one difficulty with the defini¬ 
tion given above lies in the vagueness of the expression a as likely to draw one mem¬ 
ber as another.” For an interesting discussion of the diverse approaches to this 
controversial subject, see the monograph by Ernest Nagel, Principles of the theory 
of probability (International Encyclopedia of Unified Science, vol. 1, no. 6), Chicago: 
University of Chicago Press, 1939. 
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2.3.2. Ten people are paired at random in a tournament (making five pairs). 
What is the probability that Mr. Smith will be paired with Mr. Jones? 

2.3.3. In a group of 90 people there are twice as many nonscientists as scien¬ 
tists. What is the probability that if a random sample of 7 is taken from this 
group it will contain 

a. exactly 3 scientists? 

b. at most 2 scientists? 

c. at least 4 scientists? 

d. no scientists? 

2.3.4. If a monkey in a learning experiment chooses one of two objects (one 
containing food and the other not) at random on each of 10 trials, what is the 
probability that he will make exactly 8 correct choices (that is, choices of the 
object containing food)? Exactly 5 correct choices? 

2.3.5. A sequence of two cards is to be drawn at random from an ordinary 
pack of 52 cards containing 13 spades, 13 hearts, 13 diamonds, and 13 clubs. 
Find the probability of drawing 

а. first a spade and second a diamond 

б. first either a spade or heart and second a diamond 

c. first either a spade or a heart and second a heart 

d. first a card which is not a heart and second a spade 

2.3.6. Find the same probabilities as in Exercise 2.3.5 except under the con¬ 
dition that the first card is replaced before drawing the second. 

2.3.7. A certain psychologist believes that if a subject is placed in a certain 
experimental situation a reaction of kind A is just as likely to occur as a reac¬ 
tion of kind B, and that if a sequence of reactions is obtained each reaction is 
independent of every other reaction. If this is true, what is the probability 

a. that a subject will give exactly 4 reactions of kind A and 4 reactions of 
kind B in a sequence of 8 reactions? 

b. that he will give exactly 7 reactions of kind A in a sequence of 8 reactions? 

c. that he will give at most 7 reactions of kind A in a sequence of 8 reactions? 

d. that he will give at least 2 reactions of kind A in a sequence of 6 reactions? 

2.3.8. A bowl contains 4 orange chips, 3 black chips, and 2 white chips. If 
two chips are to be drawn at random, what is the probability 

а. that neither is orange? 

б. that one is orange and one is black? 

c. that at least one is black? 

d. that at most one is white? 

2.3.9. If three chips are to be drawn at random from the bowl mentioned in 
Exercise 2.3.8, what is the probability 

a . that one is orange, one is black, and one is white? 

5. that one is orange and two are black? 

c. that at least one is orange? 

d. that at least one is orange and at least one is black? 

e. that at most one is orange and at least one is black? 

/. that at most one is orange and at most one is black? 
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2.3.10. If four chips are to be drawn from the bowl mentioned in Exercise 
2.3.8, what is the probability 

a . that 2 are orange and 1 is black and 1 is white? 

b . that 2 are orange and 2 are white? 

c. that at least 2 are orange and at most 1 is black? 

d . that at least 1 is orange and at most 2 are black and at most 1 is white? 

e. that at most 2 are orange and at most 2 are black and at least 1 is white? 

2.3.11. A bowl contains 3 orange and 1 black chips. A second bowl con¬ 
tains 1 orange and 1 black chip. If a chip is to be drawn at random from the 
first bowl and placed into the second bowl and then a chip is to be drawn at 
random from the second bowl, what is the probability 

a. that the chip drawn from the second bowl will be orange? 

b. that the chip drawn from the second bowl will be black? 

c. that the chip drawn from the second bowl will be of the same color as the 
chip drawn from the first bowl? 

d. that the chip drawn from the second bowl will be of a different color from 
the chip drawn from the first bowl? 

e. that the chip drawn from the first bowl will be orange and the chip drawn 
from the second bowl will be black? 

/. that the chip drawn from the first bowl will be black and the chip drawn 
from the seconcl bowl will be orange? 

g. that both chips will be orange? 

h. that both chips will be black? 

%. that one of the chips will be orange and the other will be black? 

j. that at least one of the chips drawn will be orange? 

k. that at most one of the chips drawn will be black? 

2.3.12. An urn contains 5 white and 2 black balls. A second urn contains 
2 white and 1 black balls. It is decided that a ball will be drawn at random 
from the first urn and placed in the second urn; then a ball will be drawn at 
random from the second urn. What is the probability 

a. that the ball drawn from the second urn will be white? 

b. that the ball drawn from the second urn will be black? 

c. that the ball drawn from the second urn will be the same color as the 
ball drawn from the first urn? 

d. that the ball drawn from the second urn will be a different color from 
that of the ball from the first urn? 

e. that both balls will be white? 

/. that both balls will be black? 

g. that at most one will be white? 

h . that at least one will be white? 

2.3.13. A bowl contains 9 orange and 16 black chips. A second bowl con¬ 
tains 4 orange and 9 black chips. Two chips are to be drawn at random from 
the first bowl and placed into the second and then a chip is to be drawn at 
random from the second bowl. What is the probability 

a. that the chip drawn from the second bowl will be orange? 
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b„ that the chip drawn from the second bowl will be black? 
c. that the chip drawn from the second bowl will be of the same color as 
exactly one of those drawn from the first bowl? 

dL that the chip drawn from the second bowl will be of a different color 
from both those drawn from the first bowl? 
e* that all three chips will be orange? 

/. that at least one chip will be orange? 

g . that at least one chip will be orange and at least one chip will be black? 

2.4. The Calculus of Probabilities. Consider two events such 
that the first event can be A or A (not A) and the second event 
B or Be Now suppose there are n equally likely different ways 
in which these two events can occur, of which rtin are both A and 
B, Wi2 are A and B, m 2 1 are A and B, and m 22 are A and B. This 
situation can be represented by the accompanying table: 






Second event 
B B 

A 
A 

Total m n + m 21 m i2 + m 22 


mn 

m 12 

m 2 i 

m 22 


Total 
mn + m u 

m 2 i + m 22 
n 


Now consider a population whose n members are pairs of 
events. Since each pair is equally likely, the probability that X 
will be a pair whose first member is A is 


Similarly, 


P{A) 
P(A) = 


mn + m 12 
n 

mn + m 2 2 
n 


P(A) + P(A) 


mn + W 12 + m 2 1 + m 2 2 


n 


= 1 


or 


P(A) = 1 - P{A) 


Similarly, P{A and B ) = — P(A and B) = — 1 etc. 

71/ Tt 

Now suppose we raise the question, “ What is the probability 
that the second event will be B if the first event is A?” This is 
the same as asking, “Of those pairs having A as the value of the 
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first member, what proportion have B as the value of the second 
member? 7 ’ The answer is clearly mu/(mu + mi 2 ). We abbre¬ 
viate “probability that the second event will be B if the first 
event is A 77 by P{B\A) (read “probability that B, given A 77 ). 
We have then 


But 

and 


or 


mu 


P{B\A) = 
P(A and B) = 
P(A) = 


P(A and B ) - P(A)P(B\A) 
P(B\A ) = P( A pgy --~ 


mu + mi 2 
mn 
n 

mu -f- mu 
n 


P(B\A ) is also called the conditional probability of B , given A. 


Examples. 1 . There are 4 white and 3 black balls in an urn. What is the 
probability of drawing a white ball and then (without replacement) drawing a 
black ball? Let A denote drawing a white ball and B denote drawing a black 
ball. Then 

P(A) = H 
P(B/A) = % = H 
P(A and B) — (}£)(%) = % 


We could have obtained this result directly by considering that there are 
(4) (3) = 12 ways of drawing a white and then a black ball and there are (7) (6) 
ways of drawing 2 balls; thus 


P(A and B) 


(4) (3) _ 2 
(7) (6) 7 


2. From an ordinary deck of 52 cards we draw a card and find that it is a 
heart. What is the probability that the next card drawn (without replace¬ 
ment of the first) will be a spade? Let A denote drawing a heart on the first 
draw and B denote drawing a spade on the second draw. Then 


p (A and B) = gggg = : 

. P , Rm _ (13)(13)/(52)(51) = 13 


13 

52 


This result could be obtained directly by considering that after the first draw 
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there are 13 spades out of 51 cards left; thus 

P(B\A) = 

3. What is the probability that if the first card drawn is not a heart the 
second card will be a spade? This is a more difficult problem than either of 
the above. Let A denote not getting a heart. Clearly P(A) = Let B 
denote getting a spade. To obtainP(A and B) consider that there are (13) (12) 
ways of obtaining A and B with the first card a spade and (26) (13) ways of 
obtaining A and B with the first card a club or a diamond. Thus there are 
(13) (12) + (13) (26) = (13) (38) ways of obtaining A and B . Thus 


P(A and B) 


P(B\A) 


(13)(38) _ 19 
(52)(51) 102 

_ 1 §X A 

P(A) ~ = 


P(A and B) _ yrio 2 


38 

153 


Notice that without any restriction at all on the result of the first draw we 
would have P(B) = — 3 % 52 . 

Now let the first event be tossing a die and the second event be 
drawing a card. Let A denote obtaining a 5 and B denote draw- 
ingaking. What is P (A and B) ? In this caseP(JS|A) = P{B)> 
because the probability of drawing a king is unaffected by the 
fact that one has obtained a 5. Thus 


P(A and B) = P(B\A)P(A) = P(B)P(A) = (J*)(Ka) = Ms 

Definition. A and B are independent events if and only if 
P(B\A) = P(B). 

It can be proved that if P(B\A) = P(B), thenP(A\B) = P(A) 
as follows: 

pr a | m _ P(A and B ) _ P(A and B) p, . v 

nA ' B) - Pffi FTO" ■ “ P{ ’ 

As an exercise the student should prove by mathematical 
induction that for a sequence of k events, A i or A h A-> or A 2 , 
. . . , A k or A k , respectively, P(Ai and A 2 and • • • and A k ) 
= P(Ai)P(A 2 \A 1 )P(Az\Ai and A 2 ) • • • P(A k \Ai and A 2 and 
• • • and A; c _x). When all events are independent this becomes 
simply 

P{A X and A 2 and • • • and A k ) = P(Ai)P(A 2 ) • • • P(A k ) 
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Suppose we consider our table again. By “A or B” we mean 
either A occurs or B occurs or both occur. Then 

_ nin + mi 2 mzi 
n n 

mu + m 2 i _ mu 

n n 

= P{A) + P(B) - P(A and B) 

EXERCISES 

2.4.1. There are twice as many men as women enrolled in course A. There 
are equally many men and women enrolled in course B . No person is enrolled 
in both courses. One student is to be drawn at random from each course. 
Find the probability that 

а . both will be women. 

б. at least one will be a woman. 

c. exactly one will be a woman. 

2.4.2. If Ei (the first event) can be A or A, E 2 can be B or B, and E$ can be 
C or C, prove that if E h E 2 , and E$ are to occur, 

P(A or BorC) = P(A) + P(B) + P(C) - P(A and B) - P(A and C) 

— P(B and C) + P(A and B and C) 

2.4.3. Suppose that one-third of the members of a large community are 
laborers, and that three-fourths of these laborers want a change in city adminis¬ 
tration. Suppose that of those who are not laborers, only one-eighth want a 
change. Find the probability that 

a. a person picked at random from this community will be a laborer who 
does not want a change. 

b. a person picked at random will be either a laborer who does not want a 
change or a nonlaborer who does want a change. 

c. a person picked at random will not want a change. 

d. if two people are picked at random, one will be a laborer who does not 
want a change and the other will be a nonlaborer who does. 

e. if two people are picked at random one will be a laborer who wants a 
change and the other will be a nonlaborer who does not. 

/. if two people are picked at random, but in order, the first will be a laborer 
who wants a change and the second will be a nonlaborer who does not. 

g. if three people are picked at random, exactly one will be a nonlaborer 
and exactly two will want changes. 

h. a person picked at random either will be a laborer or will want a change. 

2.4.4. A monkey is seated at a typewriter which contains only the 26 letters 
of the alphabet, striking keys at random. Find the probability that in a given 
sequence of 10 letters 


P(A or B) = 


mu + m 12 + m 2 i 


n 

Mil + Wl2 
n 


+ 
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а. the expression “monkeytype” will appear. 

б. the word “monkey” will appear (all letters adjacent). 

c. the letter m will appear at least once. 

2.4.5. One-tenth of a large number of radio tubes are defective. What is 
the probability that if five tubes are randomly selected, none will be defective? 

2.4.6. In a certain kind of vaccination, a vaccine to which 40 per cent of all 
people react is first given; if there is no reaction, a second vaccine is given to 
which 90 per cent of the remainder react. What is the probability that a per¬ 
son chosen at random will react only to the second vaccine? 

2.4.7. In an oral examination a student is asked a question selected ran¬ 
domly from a large number. Only if he answers satisfactorily is he asked a 
second one, selected randomly from the remainder. If he actually can 
answer only 30 per cent of the questions, what is the probability that he will 
miss the second? 

2.4.8. In an experiment on extrasensory perception five cards, each with 
a different symbol, are thoroughly shuffled and then placed in a pile. A 
subject, who knows the symbols, is to attempt to tell the order. Assuming 
that he has no ESP but guesses a sequence for the five symbols, what is the 
probability 

a. that the subject will get the order entirely correct? 

b. that he will get exactly three of the five correct? 

2.4.9. In the same situation as the above, except that two red, two black, 
and two white cards are used, and assuming the subject has no ESP and 
guesses a possible sequence of six colors (two of each), what is the probability 
that 

a. he will get the order entirely correct? 

b. he will miss only two? 

2.4.10. In an ESP experiment a coin was tossed five times. Each of 1,000 
subjects (in a large auditorium but shielded so that no one could possibly 
receive any cues from any other subject or from the experimenter) wrote down 
a sequence. Exactly 173 subjects wrote down the correct sequence. The 
experimenter claimed that he had statistically significant results and that these 
results were evidence for ESP. 

Write down the formula which the experimenter probably used (or approxi¬ 
mated) in calculating his statistic. Why does this formula not apply? What 
would have to be done to make a valid statistical test? 
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3.1. Expected Sampling Distributions Yielded by Different 
Hypotheses. Suppose that we have a bowl containing 8 chips, 
each being either orange or black. What is the expected sam¬ 
pling distribution of the number of orange chips in a sample for 
samples of size 4? Obviously we cannot answer this question, 
since we do not know how many of the 8 chips are orange; how¬ 
ever, we can form hypotheses about the number of orange chips 
in the bowl and can find out what the expected sampling dis¬ 
tribution would be on the basis of each hypothesis. If there are 
no orange chips in the bowl, the sampling distribution of the 
number of orange chips in a sample of size 4 is simply (1,0). If 
there is 1 orange chip in the bowl, the expected sampling dis- 
distribution is ( CICl/Cl , 0), (C\C\/C\, 1). If there are 2 orange 
chips in the bowl the sampling distribution is (C 2 0 Ct/Cl, 0), 
{C\C\/C% 1), (C\C\/C\ } 2), etc. The expected sampling dis¬ 
tribution corresponding to each hypothesis is shown in the 
appropriate row of Table 3.1.1, H b indicating the hypothesis that 
there are i orange chips in the bowl. 

EXERCISES 

3.1.1. A sample of 2 is to be chosen at random from a dichotomous popula¬ 
tion containing 4 members (each either an A or a non-A). Make a table 
showing the expected sampling distribution of the number of A’s in a sample 
on the basis of each possible hypothesis. 

3 . 1 . 2 . A sample of 3 is to be chosen at random from a population containing 
24 members, each either an A or a B . Make a table showing the expected 
sampling distribution of the number of A’s per sample for the hypothesis. 

a. that the population contains 3 A’s 

b. that the population contains 12 A’s 

c. that the population contains 21 A’s 

30 
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Table 3.1.1. Expected Sampling Distribution of Number of 
Orange Chips in Sample of Four Drawn from Bowl 
Containing 8 Chips, for Each Possible Hypothesis 
as to Number of Orange Chips in Bowl 


Ho 


Hi 


H 2 


H 3 

02 


1 Hi 
>> 


H 6 


Ho 


Hi 


H, 


3.1.3. Four Ph.D. candidates are to be chosen at random from a group of 32 
and given an extensive series of tests. Each tested candidate is given a final 
rating as excellent, satisfactory, or unsatisfactory. Make a table showing the 
expected sampling distribution of the number of excellent candidates per sam¬ 
ple for the hypothesis 

a. that the population contains 4 excellent candidates 

b. that the population contains 16 excellent candidates 

c. that the population contains 24 excellent candidates 

Why is it necessary that each tested candidate be rated independently of 
the ratings of the others in order that this table have any validity? 

3.1.4. A sample of blood specimens is to be chosen at random from a popu¬ 
lation of 21 and subjected to a thorough bioassay. On the basis of the 
measurements, a complex ratio for each specimen is computed. Ratios of at 


Number of orange chips in sample 
0 12 3 4 


1 

0 

0 

0 

0 

c\/c\ 

= .5000 

c\c\/c\ 

= .5000 

0 

0 

0 

c\/c\ 

= .2143 

c\c\/c\ 

= .5714 

Clcl/cl 

= .2143 

0 

0 

Cl/Cl 
= .0714 

c\c\/c\ 

= .4286 

c\c\/c\ 

= .4286 

c\c\/c\ 

= .0714 

0 

c\/c\ 

= .0143 

ctcl/cS 
= .2286 

C\C\/C\ 

= .5143 

c\c\/c\ 

= .2286 

c\/c\ 

= .0143 

0 

c\c\/c\ 

= .0714 

c\c\/c\ 

= .4286 

c\c\/c\ 

= .4286 

C\/C\ 

= .0714 

0 

0 

.2143 

.5714 

.2143 

0 

0 

0 

.5000 

.5000 

0 

0 

0 

0 

1 
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least % are considered of significance. Make a table showing the expected 
sampling distribution of the number of ratios of at least % for the hypothesis 

a. that the population contains 3 such ratios 

b. that the population contains 7 such ratios 

c. that the population contains 14 such ratios 

3.1.5. An investor finds that nine stocks meet certain stringent requirements 
which should indicate an increase in value in the near future. He decides to 
choose three of these at random and invest in each of the three. Make a table 
showing the expected sampling distribution of the number of profitable stocks 
per sample for each of the theoretically possible hypotheses. 

3.1.6. Fifty sites have been chosen as favorable for the drilling of oil wells. 
It is decided to drill wells on four of these, chosen at random. Find the 
expected sampling distribution of the number of strikes per sample on the 
hypothesis that 21 of the 50 sites actually contain oil. 

3.1.7. Write a formula giving the expected sampling distribution under each 
of the following conditions: 

a. A sample of 8 drawn from 180 bombers on the hypothesis that 100 of the 
bombers fail to meet all specifications 

b. A sample of 15 drawn at random from 365 families on the hypothesis 
that 190 of the families have authoritarian structures 

c. A sample of 25 drawn at random from 2,000 school teachers on the hypoth¬ 
esis that one-half of the teachers are in favor of abolishing examinations 

d. A sample of size n drawn from a population containing N members on 
the hypothesis that N i members are A ’s 

3.2. Testing Hypotheses about the Population. Now suppose 
that, in complete ignorance about the actual contents of the 
bowl, we take a sample of 4 chips at random and observe that all 
4 are orange. By referring to our table we see that if there were 
only 4 orange chips in the bowl (Hi) then the probability of 
drawing all 4 of them was only .0143, that is, out of all the 
C\ = 70 different samples that we could have drawn, only one of 
them would have been 4 orange chips. Therefore, we would be 
rather suspicious of H±. On the other hand, we would be less 
suspicious of H 5 , since /(4) = .0714 on the basis of H h . We 
would regard H^H 7 , and H& each as quite tenable. One might 
be tempted to say that we should regard H 8 as the most tenable 
hypothesis, in view of the sample which we had drawn. If we 
should follow this procedure, however, we should have to say 
that Ho is the most tenable hypothesis if our sample yields no 
orange chips, H 2 is the most tenable if our sample yields 1 orange 
chip, Hi if 2, H 6 if 3, and H 8 if 4. Under no circumstances there- 
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fore would we be able to consider H 1} H t , H 5) or H 1 the most 
tenable hypothesis, even though one of these were in fact the 
correct one! Before discussing this difficulty further let us con¬ 
sider a somewhat more interesting example. 

. Su PP os e that we know only that a bowl contains 4 chips, each 
either orange or nonorange. We plan to draw a chip at random, 
replace it, draw another, replace it, etc., drawing 10 times in all. 
What kind of inference can we make about the contents of the 
bowl? We can consider our 10 draws (in sequence) as a random 
sample of size 1 from the population consisting of all possible 
sequences of 10 draws, each sequence having as value the number 
of orange chips drawn. There are 4 10 = 1,048,576 sequences in 
the population (by Theorem 1.8.1). If there are no orange chips 
m the bowl, all these sequences have the value 0. If there is only 
one orange chip in the bowl (Hi), only 3 10 = 59,049 of these 
sequences have the value 0, because there are 3 ways of drawing 
the first nonorange chip, 3 ways of drawing the second, 

3 wa ys of drawing the tenth. To see how many of the 1,048,576 
sequences would have, on H h the value 1, consider that, if we 
draw the orange chip first, there are 3 9 ways of completing the 
sequence; in other words, there are 3 9 = 19,683 sequences with 
an orange chip in the first place and nonorange chips in the 
other places. Similarly, there are 3 9 sequences with an orange 
chip in the second place, 3 9 sequences with an orange chip in the 
third place • . . , 3 9 squences with an orange chip in tenth place. 
Thus on Hi there are 10 (3 9 ) = 196,830 sequences having the 
value 1. Similarly, on Hi there are 3 s sequences with orange 
chips in the first and second places, 3 8 sequences with orange 
chips in the first and third places, . . . , 3 8 sequences with 
orange chips in the ninth and tenth places. Since there are 
C 2 pairs of places, there are C2°(3 8 ) = 295,245 sequences hav¬ 
ing the value 2. On Hi there are C?( 3 7 ) = 262,440 sequences 
having the value 3, etc. In general, on the hypothesis H { there 
are C , J J (i) x (4 - t) 10 ~* sequences having the value x. The dis¬ 
tribution (fr.f.) of the population of sequences on each hypoth¬ 
esis is given in Table 3.2.1. 

Now suppose that we actually draw a sequence of 10 chips, 
replacing after each draw, and observe 4 orange chips. By 
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reference to our table we see that we can consider Hi and H, 
each as quite tenable and that we can reject H 3 with considerable 
confidence (assuming that there is no other reason for consider¬ 
ing H 3 as particularly reasonable). But just how confident 
should we be in rejecting H 3 in this particular case? Can we say 
that the number .0162 expresses the level of confidence with 
which we reject H 3 ? To answer this question, consider the 
following situation. Suppose we place 3 orange and 1 black 
chips in a bowl, that is, suppose we set up a situation in which 
Hs is actually correct. Now suppose we were to draw a great 
many sequences, each time referring to our table and rejecting 
H 3 if the number of orange chips is 4, 3 , 2 , 1 , or 0 . Since H 3 is 
correct, each of our rejections will be a false assertion, and 
of our total number -of assertions there will be approximately 
.0162 -f- .0031 -f- .0004 + .0000 + .0000 = .0197 false ones, if 
each possible sequence is drawn just as often as each other 
possible sequence. Therefore, since rejecting H 3 with the 
observation of 4 orange chips in the sample commits us to the 
rejection of H 3 with the observation of 3, 2, 1 , or 0 orange chips, 
in rejecting H 3 with the observation of 4 orange chips we should 
consider ourselves as operating at the .0197 level of confidence 
rather than at the .0162 level. Similarly, if we plan to reject 
H % if there are 0, 1, 9, or 10 orange chips in the sequence, we 
must state, in rejecting i7 2 with the observation of 1 orange chip 
in the sequence, that we reject H 2 at the 

.0098 + .0098 + .0010 + .0010 = .0216 

level of confidence, not the .0098 level. 

The heavy black lines in the table include all those entries such 
that the hypothesis at the left is tenable if the sample value at the 
top is observed. The choice of which cells to include within the 
black lines is more or less arbitrary; however, it is conventional 
never to choose a confidence level which is greater than .05; in 
many cases .01 or even .001 is the level chosen . 1 In our table 
we can observe before drawing our sequence that if H 0 is correct 
we are in no danger at all of making a false assertion; if H x is 

1 The level of confidence chosen depends upon the purposes of the research 
worker. This question is discussed in Sec. 3.8. 




36 


BASIC STATISTICAL CONCEPTS 


correct then the probability that we are going to make a false 
assertion is .0197; if H 2 is correct, the probability is .0216; if Hz 
is correct, the probability is .0197; if H± is correct, the prob¬ 
ability is 0. Thus we can say before drawing our sample that 
no matter which hypothesis is correct the probability that we shall 
make a false assertion, that is, reject a hypothesis when it is the 
correct one, is some number not greater than .0216, the largest 
of the above probabilities. After drawing our sample, however, 
we should not speak of the probability that our assertion is false; 
we should say simply that we are rejecting a hypothesis at such 
and such a level of confidence. 

EXERCISES 

3 . 2 . 1 . A bowl contains 100 chips. One chip is drawn at random; it is red. 
Test the hypothesis that 

а. the bowl contained only 1 red chip. 

б. the bowl contained only 2 red chips. 

c. the bowl contained only 20 red chips. 

d . the bowl contained all red chips. 

3 . 2 . 2 . A bowl contains 100 chips. Two chips are drawn at random. Both 
are black. Test the hypothesis that 

а, the bowl contained only 2 black chips. 

б. the bowl contained only 3 black chips. 

c. the bowl contained only 10 black chips. 

d . the bowl contained only 20 black chips. 

3 . 2 . 3 . A bowl contains 100 chips. One chip is drawn at random and 
replaced, then another chip is drawn. Both are black. Test the same hy¬ 
potheses listed in Exercise 3.2.2. 

3 . 2 . 4 . A monkey makes 10 correct choices in a sequence of 12 choices. 
Test the hypothesis that the monkey is making choices at random. (Assume 
that in each trial there are exactly as many possible incorrect choices as 
correct.) 

3 . 2 . 5 . From a list of professors in a certain university we select a name at 
random, then we select another name at random (which may be the same as 
the first name), etc., selecting six names in all. We find that all six of these 
professors are from the East. Test the hypothesis that in the university 
there are as many non-Eastern professors as there are Eastern ones. 

3.3. Confidence Intervals. The heavy black lines in the pre- 
vious table indicate, for each value of the sample, which hypoth¬ 
eses are reasonable, using the .022 level of confidence. (See 
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Sec. 3.2 for the way in which the number .022 is obtained.) 
Thus, for an observed value of 0 we say, at the .022 level of 
confidence, that the number of orange chips in the bowl is either 
0 or 1; for an observed value of 1 we say, at the .022 level of con¬ 
fidence that the number of orange chips in the bowl is 1; for an 
observed value of 2 we say, at the .022 level of confidence, that 
the number of orange chips in the bowl is 1 or 2; etc. The 
intervals 0-1 (inclusive), 1, 1-2, etc., are confidence intervals. 
We can say, before drawing our sequence of 10 chips, that the 
probability that we are going to make a false assertion about the 
limits within which the true number of orange chips lies is less 
than .022. Another way of stating this is as follows: Probability 
(our confidence interval will include the actual number of 
orange chips) > .978. 


EXERCISES 

3 . 3 . 1 . From a bowl containing 3 chips, each either orange or black, we 
draw at random a sequence of 7 chips, of which 6 are orange. Find the .03 
confidence interval for the number of orange chips in the bowl. 

3 . 3 . 2 . From a bowl containing 3 chips, each either orange or black, we draw 
at random a sequence of 20 chips, of which 18 are black. Find the .01 con- 
fidence interval for the number of orange chips in the bowl. 

3.4. Power of a Test. In the case discussed in Sec. 3.3, sup¬ 
pose that the number of orange chips in the bowl is actually 
0. In this case we are going to observe 0 as the value of our 
sequence; the .022 confidence interval for an observed value of 0 
is the interval 0-1 inclusive. Thus there is absolutely no chance 
of rejecting Hi if H 0 is correct. This situation is described by 
saying that the power of the test of H t is 0 if H 0 is correct. 

Definition. The power of a test T of a hypothesis H is the 
probability of rejecting II, using T. The power of T is a func¬ 
tion of which hypothesis is actually correct, as well as the level 
of confidence chosen; it is computed by summing probabilities 
of those observations which would entail the rejection of H. 

Example. In the orange-chip problem, the power of our test of Hi is Oif 
Hq is correct; the power of our test of Hi is 

.2051 + .1172 + .0439 + .0098 + .0010 = ,3770 
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if II 2 is correct; the power of our test of Hi is 

.1460 4" .2503 4“ .2816 4~ .1877 4- .0563 = .9219 

if Hz is correct; the power of our test of Hi is 1 if H 4 is correct. If Hi is cor¬ 
rect, the power of our test of H„ is 1 - .0563 = .9437; the power of our test 
of H 2 is 

.0563 + .1877 + .0000 + .0000 = .2440 

etc. 

EXERCISES 

3 . 4 . 1 . Using the table in Sec. 3.2, make a table showing the power of our 
test of each hypothesis for each possible correct hypothesis. 

3 . 4 . 2 . Using the table in Sec. 3.1, make a table showing the power of a test 
(rejecting at the .03 level of confidence) of each hypothesis for each possible 
correct hypothesis. 

3.5. Increasing the Power of a Test by Increasing the Number 
of Observations. No matter which hypothesis is correct, that is, 
no matter what the true state of affairs, we can increase the 
power of our test of an incorrect hypothesis to as close to 1 as we 
wish by increasing the number of observations, that is, the size 
of the sequence which we draw from the bowl. The expected 
sampling distributions for a sequence of 100 chips drawn at 
random are shown by Table 3.5.1. 

Table 3.5.1. Expected Sampling Distributions of Number of 
Orange Chips in Sequence of 100 Chips Drawn Randomly 
from Bowl Containing Four Chips, Replacing after 
Each Draw 


Number of orange chips in a sequence of 100 chips 
0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-100 
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In this table we have grouped the possible observed numbers 
of orange chips into intervals, but our heavy black lines indicate ? 

confidence intervals, as before, except that now we can say j 

before drawing our sequence that the probability that we are 
going to make a false assertion (that is, reject a hypothesis when ■ 

it is actually correct) is at most .018. Whereas with a sequence i 

of 10 the power of our test of Fi if H 0 is correct is 0, with a ; 

sequence of 100 the power of our test of H x if H 0 is correct is ! 

1.000. Similarly, the power of our test of H h if H 2 is correct, ' 

has been increased from .3770 to .443 + .511 + .028 = .982. \ 

As an exercise the student should compute the increase of the j 

power of our test of each hypothesis, given that each hypothesis ! 

in turn is the correct one. j 

To increase the power of our test still further we can take a < 

sequence of 1,000. The expected sampling distributions are i 

shown by Table 3.5.2. 

j 

Table 3.5.2. Expected Sampling Distribution, for Each Possible : 

Hypothesis, of Number of Orange Chips in Sequence of r 

1,000 Random Draws, Replacing after Each Draw 

1 

Number of orange chips in a sequence of 1,000 draws * 

0-99 100- 200- 300- 400- 500- 600- 700- 800- 900- - 

199 299 399 499 599 699 799 899 1,000 * 

H 0 


.23 Hi 

U1 

<D 

A 

o H 2 

Cl, 


H t 


No matter which hypothesis is correct, the power of our test of 
any incorrect hypothesis is now approximately 1. Another way 
of stating the law that the power can be increased to as close to 1 
as desired is to say that the probability that the proportion of 
orange chips in the sequence will diverge more than a fixed 
amount from the proportion of orange chips in the bowl 


1 

0 

0 

0 

0 

0 

0 

0 

0 

_!> 

0 

.0000 

.0001 

.9998 

.0001 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.488 

.512 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0001 

.9998 

.0001 

.0000 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 
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approaches 0 as a limit if we increase the size of the sequence. 
This is known as Bernoulli's theorem, which is usually stated in 
the following way: If the chance of an event occurring upon a 
single trial is p, and if a number of independent trials are made, 
the probability that the ratio of the number of successes to the 
number of trials differs from p by less than any preassigned 
quantity, however small, can be made as near certainty as may 
be desired by taking the number of trials sufficiently large. As 
an exercise the student should satisfy himself that these state¬ 
ments are equivalent. 

We summarize the effect of increasing the number of observa¬ 
tions in Table 3.5.3. This table shows the power of the test of 
each hypothesis, assuming the hypothesis at the left to be cor¬ 
rect, for 10, 100, and 1,000 observations. 

Table 3.5.3. Power of Test of Each Hypothesis in Chips-in-bowl 
Problem, for Sequences of 10, 100 (Boldface), and 1,000 
(Italics), Showing How Power Increases (if 
Hypothesis Tested Is Incorrect) as 
Number of Observations Increases 

Hypothesis tested 

Ho H i lb H 3 H t 


Ho 


Hi 

m 

*OQ 

0) 

& 

-+3 

O 

a 

£ Ih 

-+3 


o 

O 

h 3 


Hi 


0 

0 

1 

1 

1 

0 

1 

1 

1 

1 

0 

1 

1 

1 

1 

.944 

.020 

.244 

.922 

1.000 

1.000 

.000 

.999 

1.000 

1.000 

1.000 

.000 

1.000 

1.000 

1.000 

.999 

.377 

.021 

.377 

.999 

1.000 

.982 

.018 

.972 

1.000 

1.000 

1.000 

.000 

1.000 

1.000 

1.000 

.922 

.244 

.020 

.944 

1.000 

1.000 

.851 

.000 

1.000 

1.000 

1.000 

1.000 

.000 

1.000 

1 

1 

1 

0 

0 

1 

1 

1 

1 

0 

1 

1 

1 

1 

0 
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3.6. A Word of Caution. In Table 3.5.2 we see that the prob¬ 
ability of obtaining between 300 and 399 orange chips in our 
sequence of 1,000 draws is very small no matter which hypothesis 
is correct. What then if we should actually draw a sequence 
containing, for example, 380 orange chips? At first it might 
appear that we should accept Hi as the correct hypothesis, in 
view of the fact that .0001 is the largest probability in the 
column headed 300-399; however, since .0001 is itself such a 
small probability we should perhaps question our method of 
drawing the chips; it may be that our method contains some bias 
which would destroy the validity of the mathematical model 
which we have set up. In any practical situation the research 
worker should always be aware that unless he knows some 
hypothesis upon which the results which he obtained would be 
reasonably probable, he is perhaps using a mathematical model 
to which he has no scientific right. As a matter of fact, in the 
situation which we have been using as an example, it would be 
more prudent to construct Table 3.6.1. 


With Table 3.6.1 before us, we can be confident (at the .05 
level) that we are going to draw either 0, 224-276, 469-531, 
724-776, or 1,000 orange chips. If we draw some other number 
of orange chips we may wish to reject (at the .05 level of confi¬ 
dence) our working assumption, namely, that our method of 
drawing is a random one. Furthermore, we can be confident 
at the .01 level that we are going to draw either 0, 216-284 
460-540, 716-784, or 1,000 orange chips; our failure to do so 
would make us even more suspicious of our method of drawing 
3.7. The Logic of Statistical Inference. The mathemati¬ 
cal models which we have just been discussing are very special 
cases of statistical inference; in future chapters we shall take up 
models which can have a wider application. However, the 
fundamental logic of the procedure will remain the same. Always 
m statistical inference, there are certain hypotheses about a 
population. If, on the basis of each hypothesis, the expected 
sampling distribution of a statistic can be determined, it is then 
possible to make some observations (that is, draw a sample) and 
then to reject certain hypotheses as untenable and to accept 
ot er hypotheses as tenable; in other words, to determine a set 
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of hypotheses (confidence interval) such that one is confident (at 
a certain level of confidence) that one of the hypotheses in the set 
is correct. The purpose of the foregoing discussion has been to 
give the student a firm grasp of this fundamental logic He 
should now be able to handle problems like the following one. 
Upon a certain hypothesis H, the expected sampling distribution 
of a certain statistic z is given by the following table: 

Possible Values of the Statistic z 


Zi~Z 4 

Z5-Z7 

Z 8 Z 12 

Z13 Z 19 

Z20 Z40 

10 

0 

0 

.020 

.950 

.020 

.005 


Suppose we draw a sample and obtain a value of z 16 for our 
statistic. At what level of confidence can we reject HI If 
there are other hypotheses upon which the value of z 16 is a 
reasonably probable one (that is, in relation to other possible 
values 1 ), we can reject H at the .05 level of confidence. 




3.7.1. On the hypothesis H the expected sampling distribution of the sta- 
.... 1 v (5 — \W — 5h2 

f 2, .... 9. (The 


tistic w is given by f(w) = ^ -i f-" 5 1>* whe n w 

50 


vertical lines mean the absolute value of whatever is between them. The 
absolute value of any number is positive, for example, |3 - 7| = |-4| « 4 ) 

At what level of confidence can we reject H if the observed value of w turns 
out to be 9 r 

3.7.2. On the hypothesis i the expected sampling distribution of the statis- 
- • . - ' _ (4 - \y - 4|) 3 


tic y is given b yf(y) 
value of y is 6. Test H. 


136 


wheny — 1,2, . . . ,7. The observed 


3.7.3. On the hypothesis 11’ (read “H prime”) the expected sampling dis¬ 
tribution of the statistic x is given by f(x) = - when x = 1, 2 9 

The observed value of x is 9. Test II'. 

u 3 f 4 °?5 t - l^llr H the expected sampling distribution of 2 is given 
by f(z) = -when z = 1, 2, . . . , 8. The observed value of 2 

is 8. Test H. 

verv^marwhlr^tt ^ ° ne SpeCified Value of a statistic is 

very small, what matters is the probability of that value in comparison with the 
probabilities of other specific values. If we flip a penny 1,000 times, the probabil¬ 
ity of obtaining 500 heads is small, yet larger than that of any other specified value 
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3 . 7 . 6 . On the hypothesis IT 2 the expected sampling distribution of w is given 
by f{w) = 65^! with w = 1 , 2_,8. The observed value is 7. 


Te 3.7d}. 2 'on the hypothesis Hn the expected sampling distribution of the sta¬ 
tistic t is given (approximately) by the following table: 


(—7H-3.850) 

(-3.850H- 2.845) | 

(— 2.845)-(—2.528) 

.0005 

.0045 

.005 


(-2.528)-(- 2.086) 

| (-2.086H2.086) 

(2.086)-(2.528) 

.015 

1 .950 


.015 

(2.528)-(2.845) 

(2.845)-(3.850) 

(3.850)-(7.000) 

.005 

| .0045 

.0005 


Test Hn for an observed value of t - 2.7. 

3.7.7. On the hypothesis Hn the expected sampling distribution of the 

statistic CR is given (approximately) by the following table: 


( — 3.291) or lower 

(-3.291)-(-2.576) 

(-2.576H—2.326) 

.0005 

.0045 

.005 


(-2.326)- (-1.960) 

( —1.960)-(1.960) 

(1.960)-(2.326) 

.015 

1 .950 


.015 

(2.326)-(2.576) 

(2.576)-(3.291) 

(3.291) or higher 

.005 

.0045 

.0005 


The observed value is CR = —2.0. Test Hn. 

3.7.8. On the hypothesis H the expected sampling distribution of the mean 
breaking strength in samples of 50 pieces of a certain kind of rope is given y 
the following table: 


474 or lower 

475-480 

481-519 

520-525 

526 or higher 

.005 

| .020 

.950 

| .020 

.005 


A sample of 50 pieces of rope is drawn randomly and the mean breaking 

strength is found to be 479. Test H. 

37 9 On-the hypothesis II the expected sampling distribution of the mean 
intelligence-test score for 30 army officers drawn at random from a certain 
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army base is given by the following table: 

1160fl0Wer 117 - 118 I H8-122 | 123-124 I 125 or higher 


A sample of 30 scores yields a mean of 116. Test H 

3.8. Two Types of Error. It is apparent from the previous 
discussion that either of two types of error can be made fn draw- 
ng an inference about a population from a sample: 

cornet. ThTievel “ ay be reje ° ted when 11 is in fact, 

abilitv nf k- ° C0n ^ ( f ence states the maximum prob¬ 
ability of making a type I error. P 

Type II Error. A hypothesis may fail to be rejected when it 

abihtvthlTT^ + , ASthe r ;° Wer of a test ( p ) sta tes the prob- 
abihty that a hypothesis will be rejected, 1 - P is the nrob 

ability of making a type II error, except when the hypothecs 

co U :r,i the probabiiity ° f makini! a n— 


noJ'r L Peobabilitie s of Making Type II 


091 R T.,„ - * i iPE ll H.HROH When TJsr 

i™; C T IDENCE (Boldface Type) Compared with 
Those When Using .1344 Level of Confidence, in 

Chip-in-bowl Problem* 

Hypothesis tested 
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We can decrease the probability of making a type I error by 
taking lower and lower levels of confidence. However, with the 
same number of observations, a lower level of confidence 
decreases the power of a test and thus increases^h.a^P roba ^ 
of a type II error. Consider, for example, Table 3.8.1, wh c 

Table 3.8.2. Probabilities of Making Type II Ebror in 

bowl Situation for 10, 100 (Boldface), and 1,000 (Italics) 
Observations, with Level of Confidence - .0216 in 
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In many situations it is not feasible or perhaps even possible 
to increase the number of observations, and one must choose 
between taking a low level of confidence and thus increasing the 
probability of a type II error, or taking a higher level of confi¬ 
dence and thus increasing the probability of a type I error. 
Which course one follows depends upon the requirements of a 
particular situation. For example, if the rejection of a given 
hypothesis would imply that certain very costly changes should 
be made in a factory, school, or business, or in the case of scien¬ 
tific research that drastic changes must be made in a well-estab¬ 
lished theory, and if there are no other very compelling reasons 
to make these changes, then a very low level of confidence should 
probably be required, that is, the probability of making a type I 
error should be made very low. If, on the other hand, one must 
take some kind of action on the basis of the best possible evi¬ 
dence as to which hypotheses are most tenable, then a higher 
level of confidence should be taken with a resulting lower prob¬ 
ability of a type II error. 

3.9. Confidence Intervals with One Bound Only. In many 
cases we may be interested in ascertaining a confidence interval 
with lower bound only; that is, we may be interested in being 
able to say with a certain degree of confidence that the popula¬ 
tion characteristic (called a parameter) is at least as great as a 
certain value (lower bound). Conversely, we may wish to say 
that the parameter is at most as great as a certain value (upper 
bound). In either case we shall sum the cells in our table of 
expected sampling distributions in one direction only. Consider, 
for example, a population which has a parameter which must of 
necessity have one of the values a, b, c, d, e, or /, which are in 
ascending order. . Let z u z 2) ... , z n be the possible values of a 
certain statistic, in ascending order. Let the expected sampling 
distributions of z be given by Table 3.9.1. 1 

The heavy black lines indicate the .05 confidence interval for 
the parameter for each observed value of the sample. For 
example, for an observed value of z$ the .05 confidence interval 
is c-f inclusive, c being the lower bound. We are in this case not 

1 The question of whether we can think up an actual situation which fits this 
mathematical model is irrelevant to this discussion. 
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interested in the danger of including e and / in the interval, even 
though they seem rather unlikely; all we care about is that the 
parameter is at least c (at the .05 level of confidence). At first 
it may appear that we are merely giving up information con¬ 
tained in the sample, that we are gaining nothing by taking 
lower bound only. That this is not true can be seen by consider¬ 
ing that with lower bound only we can say, with an observation 
of 27 , that the parameter is at least e (at the .05 level of confi- 

Table 3.9.1. Expected Sampling Distributions of z, on Each 
Possible Hypothesis 

Possible values of 2 in ascending order 



Zi 

Z 2 

Z 3 

z 4 

Z 5 

z 6 

z? 

z 8 

z 9 

ZlO 

Zn 

H { 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.0203 

.1010 

.3152 

.4779 

.0856 

H e 

.0000 

.0029 

.0033 

.0155 

.0712 

.1064 

.3117 

.2622 

.1814 

.0356 

.0098 

m 

1 H d 

.0176 

.0202 

.2163 

.4403 

.1985 

.0571 

.0215 

.0154 

.0097 

.0025 

.0008 

O 

Oh TJ 

w 

Ih 

. 2447 

.4962 

.1635 

.0497| 

.0204 

.0128 

.0076 

.0033 

.0016 

.0002 

.0000 

.9215 

.0371 

.0196 

.0102 

.0058 

.0034 

.0015 

.0008 

.0001 

.0000 

.0000 

H a 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


dence), whereas, if we had taken both upper and lower bounds 
we could say only that the parameter lies in the interval d-e 
inclusive. It may appear to the student that this juggling of 
confidence intervals is highly suspect, that statistics seems to be 
a highly arbitrary business. It is true that the method of selec¬ 
tion of confidence intervals is to a certain extent arbitrary and 
depends upon the purposes of the investigator. However, the 
statistician can always justify his choice of a method of ascer¬ 
taining a confidence interval (if that choice is made before a 
sample is drawn) by pointing out that before taking his sample 
the probability was p that he would make a false assertion about 
the limits within which the parameter lies. 

A somewhat more convenient table in the foregoing type of 
problem is one which shows for each value Zi the probability of 
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obtaining a value of z at least as great as z t . Our table is trans¬ 
formed into Table 3.9.2. 


Table 3 . 9 . 2 . Expected Sampling Distributions of z for Each 
Possible Hypothesis, Showing the Probability of 
Obtaining a Value of z at Least 
as Great as Zi 



With Table 3.9.2 the lower bound for the desired confidence 
interval can be seen at a glance. If we had desired an upper 
bound only, we would have summed in the opposite direction, 
that is, from left to right; our table would then show the c.d.f. on 
the basis of each hypothesis. 




Chapter 4 

PARAMETERS AND STATISTICS 


4.1. Definitions 

Definition. A parameter is a characteristic of a population or 
its distribution. * 

Examples. 1. The proportion of the population of IQ’s of California school 
children falling above 100. 

2. The total range of values of the population of high jumps by athletes at 
the Olympic games in 1936. 

3. The value occurring most frequently in the population of all possible 
ways in which two dice can land, each way having as value the sum of the 
numbers indicated by the two faces. (This value is, as we have seen, 7.) 

We shall usually refer to parameters with Greek letters . 

Definition. A statistic is any characteristic of a sample or its 
distribution. 1 We shall refer to statistics with Latin letters . 

4.2. Measures of Central Tendency. It is often useful to 
designate some value of a distribution which represents the 
center or central tendency in some sense. There are several kinds 
of center which we might choose; some of these are defined as 
follows. 

Definition. The mode of a distribution is the value having the 
highest frequency, if there is such a value. 

Examples. 1. The parameter listed as Example 3 in Sec. 4.1. 

2. The value schizophrenic in the distribution of patients in a certain 
hospital: (%, schizophrenic), (% 4 , manic-depressive), 0^4, paranoiac). 

3. The distribution Xi), £ 2 ), Gi, £ 3 ), £ 4 ) has no mode. 

1 Later in our discussion the term tl statistic” will also be used to include vari¬ 
ables which are not determined by the sample alone but depend also upon some 
parameter; for example, if Y = sample mean — population mean, then Y is not a 
characteristic of either population or sample alone, but we choose to call Y a 
“ statistic.” 
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thSI^W ’ 5 Th( l ZfT 0f a distribution is the value x such 
that F(x)> .5 and Rvalue just lower than x) < .5, if such a 

value exists. The median is therefore the middlemost value. 

f Thefr'f ( 1 26?Sw i ^^' d ' f ' ( ‘ 3 ” 2) ’ <-M), (h-4). 

e r,T - ^o,2), (.25,3), (.50,4) has no median. ' 

ij?‘‘fAf S ‘ ribu t i0n is qualitative, that is, such that the 

DefifoLn Th Ys ?“■ ° rder ’ H d ° eS not have “ 

Definition. The anthmetic mean of a distribution is the sum 

of the products of the values and their corresponding propor- 

the^’alues ' 1S ’ Xlf(Xl) + X . 2f(x2) + • ‘ • + x iJ (x k), assuming that 
the values are quantitative and do not indicate merely an order 

“averaae^ustd ^ ^ P ° pulations - the as the 

average used in school arithmetic. (As an exercise the 

student should prove this to himself.) 

is 2 E S7+3(W+ T 5(‘“7“' ° f "* (W.2). (*,»>, (*,5) 

t(Ve) + 204) + 3 04) + 4(H) + 5 OA) + 0(H) = 3.5 

-if! 0 *® that if f he values are qualitative or ordinal (that is, such 
that they can be ordered but not such that we can assign dis¬ 
tances between them) the arithmetic mean does not exist For 
example, the population (3,X0, (2,X 2 ), (4,X 3 ), (1,X 4 ), in which 
the values are ranks, does not have an arithmetic mean 

ihe arithmetic mean is also called the expected value of a 
member of the population (or sample) to be chosen at random 

Tandom iZT) C tYT Ctati °l ° f * member to be chosen a * 

random. If we let X be an abbreviation for “a member of the 
f et to be chosen at random" and E be an abbreviation for 
expected value of” then E(X) means the expected value of a 
member of the set to be chosen at random. We shall use E(X) 
to refer indifferently to either the sample or the population 
mean that is, a statement utilizing this symbol will hold for both 
population and sample. When speaking specifically of one or 

population 1 "mefn ^ ^ and * ** the 



52 basic statistical concepts 

The arithmetic mean is the center oj parity of a < f “(“h” 
the sense that the sum«f ^ * X) (Th e student 

deviation of the member A; being x* ; 

should Drove this as an exercise.) r v 

We can also define the expected value of any function o 

asE[ 9 (X)l - (KxO/M + ffx,) + ; ; ; 

T “e reader should satisfy himself that, if g and h are any func- 

tions and c is a constant, then 

E[cg(X) + h(X)] = cE[g(X)] + E[h(X)\ 

This important property of E is summed up by saying that E Tsa 
linear operator. Note also that E{c) = c. Remembering that 
Z octant is a function of any variable, the student should 
noL the special case E[c,(X) + hi = cE^X)] + b, where both 

6 ‘^metric mean of a distribution is the «h 

root of the product of the values of all N members of the popula¬ 
tion (or sample), each value occurring m the product as many 
timei as there ale members i n the set whi ch have that value, 
that is, geometric mean = VwiW 2 ■ * w N , in whic w t is e 
value of the ith member. Note that 


W\Wi • • • w N — (Xi) ‘(*2) 


(Xfc) T 


in which x% is a value different from every other value and n; is 
the number of members of the set having that value. T 


Geometric mean 


w 


'1W2 • • ' 

= xV /n xT /n 


~vT n = (Wi Wi • • • Wn) 


1/N 


xf /N 


X f i Xl) x{ <X2) 


. yyJ'fXfc) 


Example. The geometric mean of the distribution GbD, (H,2), (K,3), 
(H,A) is 11*2*3*4* = 1.924. 

Measures of central tendency other than those listed are 

ra 4t Measures of Variability. It is useful to indicate the 
amount of variation in value among the members of a popula¬ 
tion or sample. The most obvious measure of variation t 

range. 
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Definition. The range of a distribution is the difference 
between the highest and lowest values. 

Example. In the distribution 0+-4), (K,0), (y, 1), (y, 3 ) the range is 
3 - (-4) =3+4 = 7. 

By far the most frequently used measures of variability are 
the variance, and its square root, the standard deviation. 
Definition. The variance of a distribution is the arithmetic 

mean of the squared deviations from the arithmetic mean; that 
is, 

Variance s [ Xl - E(X)} 2 f( Xl ) + [® 2 - E(X)] 2 f( Xi ) 

+ '••+[**- E(X)Yf{ Xk ) 
The standard deviation is the square root of the variance. 

The variance of a population is denoted by a 2 , the variance of 
a sample by s 2 . We shall use “ Var ” to denote either a popula¬ 
tion or sample variance. Note that Var = E{[X — E{X)} 2 }, 
that is, the variance of a distribution is the expected value of a 
squared deviation from the arithmetic mean. 

Example The distribution 0+1), (y, 2 ), (y e , 3 ), (J+4), 0+5), 0+6) has 
the mean 3.5. (Unless otherwise indicated, “mean” in this text always refers 
to the arithmetic mean.) Therefore the variance is 

(1 - 3.5) 2 (+) + (2 - 3.5) 2 0+ + (3 - 3.5) 2 (+) + (4 - 3.5) 2 0+ 

+ (5 - 3.5) 2 0+ + (6 - 3.5) 2 0+ = sy 12 

For computational purposes note that 

E{[X - E{X)Y) = E{X 2 - 2 XE(X) + [E{X)} 2 } 

= E{X 2 ) - 2 E(X)E(X) + [E{X)} 2 = E(X 2 ) - [E(X)] 2 . 

For example, we could have computed the variance 3 y l2 by 
taking 

E(X 2 ) — (1 )(3hD + (4)(J^) -f (9) (J^) + (16) (3^0 

+ (25) (M) + (36) (K) = 93 ,^ 
Thus Var = E(X 2 ) - [E{X)} 2 = - 4% = a^ 2 . 

EXERCISES 

4.3.1. The following are sample data, each datum being the value of a mem¬ 
ber of the sample: 3, 2, 1, 1, 1, 3, 1, 4, 5, 1, 2, 4, 3, 2, 3, 5, 4, 2, 5, 1,2. 
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Compute each of the following statistics, if it exists. 

a. The mode 

b. The median 

c. The arithmetic mean 

d. The variance (compute in two ways) 

e. The standard deviation . . 

4.3.2. Compute the same statistics as in Exercise 4.3.1 for the following 
sample data: .00, .02, .00, .01, .03, .05, .00, .01, .03, .01, .05, .02, .01, .05, .04, 

. 00 , . 01 . , „ . 

4.3.3. Compute the same statistics as in Exercise 4.3.1 for the following 
sample data: for, for, against, for, against, for, for, against, against, for, for, 
for, against, for, against, against, for. 

4.3.4. Compute the same statistics as in Exercise 4.3.1 for the following 

sample data: 9, 8, 9, 9, 8, 7, 9, 6, 7, 7, 9, 8. . 

4 .3.5. Find the arithmetic mean and the variance for each of the popula¬ 
tions mentioned in the following exercises: 

a. Exercise 1.7.1 

b. Exercise 1.7.2 

c. Exercise 1.7.3 

d. Exercise 1.7.4 

e. Exercise 1.7.5 

4.3.6. Find the arithmetic mean and the variance for each of the popula¬ 
tions mentioned in the following exercises: 

a. Exercise 1.8.1 

b. Exercise 1.8.2 

c. Exercise 1.8.3 

d. Exercise 1.8.5 

e. Exercise 1.8.9 
/. Exercise 1.8.10 

g. Exercise 1.8.11 

h . Exercise 1.8.12 

i. Exercise 1.8.13 . 

4.3.7. Find the arithmetic mean and the variance of the expected sampling 
distribution of the arithmetic mean of random samples of size 2 from each of 
the following populations: 

a. (0,Xi), (0,X 2 ), (0,X 8 ), (1,V), (l.W) 

b. (0,Xi), (0,X 2 ), (1,X 3 ), (l,Xi), (2,X 5 ) . , , 

Compare the arithmetic means which you have just computed with 

the arithmetic means of the two populations from which the samples are 

4.3.8. Find the arithmetic mean and the variance of the expected sampling 
distribution of the arithmetic mean of random samples of size 3 from each of 
the following populations: 

a. (0,Xi), (0,Z 2 ), (1,X 3 ), (1 ,Xi), (1,W). 

b. (0,Xi), (1,X 2 ), (l,Xs), (2,X 4 ), (2,X») 
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Compare the arithmetic means which you have computed with the arith¬ 
metic means of the two populations from which the samples are drawn. 

4.3.9. Find the arithmetic mean and the variance of each row of Table 3.1.1. 
Compare the arithmetic mean in each case with the mean number of orange 
chips hypothesized, giving each orange chip in the bowl the value 1 and each 
black chip the value 0. 

4.3.10. For the table which you constructed in Exercise 3.1.2, find the 
arithmetic mean and the variance for each of the three rows. Compare the 
arithmetic mean in each case with the arithmetic mean of the hypothesized 
population, giving each A the value 1 and each non-d the value 0. 

4.3.11. Find the arithmetic mean and the variance for each row of Table 
3.2.1. Compare the arithmetic mean in each case with that of the hypothe¬ 
sized population, giving each orange chip the value 1 and each black chip the 
value 0. 

. Prove that the arithmetic mean of the expected sampling distribu¬ 

tion of arithmetic means for any size sample from a finite population is the 
arithmetic mean of the population, that is, that E{m) - ^ where m is the 
sample mean and fji is the population mean. 

4.4. Abbreviations. The student should by now desire an 
abbreviation for an expression like 

Xlf(X\) + Xif{x2j +•••-(- Xkf(Xk) 

We shall abbreviate an expression of this kind by using the 
symbol 2, called the summation sign. Using the summation 

h 

sign, the above expression is abbreviated ^ Xif(x,i ). The letter 

* m our abbreviation is called an index. Note that our index 
does not appear in the expression when it is fully written out; 
therefore, we could have used j or l or any other letter not 
appearing in the expression as the index. The expression 
(3 - aYm + (4 - a)‘/(4) + • • • + (m - a)</(m) can be 

abbreviated ^ (x — a) c f(x). This latter expression is read 

x = 3 

“the summation of x minus a to the cth power times / of x with x 
running from 3 to m.” 

A product can be abbreviated in an exactly analogous way 
using the symbol II. For example, the geometric mean is 

k 

defined as [] xl (Xi \ 
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It is customary to drop the index and superscript when the 
context clearly indicates what should be summed or multiplied, 
and we shall follow this convention; for example, we shall write 

k 

3>xf(x) for ^ Xif(xi) when there is no danger of confusion. 

EXERCISES 
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4.4.5. Write an expression for the probability that a sequence of 30 choices 
(each choice having two alternatives, only one of which is correct) by a child 
in a problem which is too difficult for the child (so that the child ctooses 
randomly) will contain 

a. at least 20 correct choices 

b. between 10 and 20 correct choices 

c . at most 25 correct choices 


4.5. Moments. The mean and the variance of a distribution 
are merely special cases (though very important ones) of 
moments of that distribution. 

Definition. The kth moment about the point c is the expected 
value of the Mh power deviations from the point c; that is, 
mom* = E[(X - c) k ] = ^ (x - c) k f{x). 

all x 

Examples. 1. The arithmetic mean is the first moment about the origin, 
that is, momj = E{X) = E[(X — 0) 1 ]. 

2. The variance is the second moment about the mean, that is, momf (X) . 
The computational formula for the variance, derived in Sec. 4.3, is the second 
moment about the origin minus the square of the first moment about the 
origin; that is, niorri|, x , = mom® — (mom)) 2 or Var. = E(X’ 1 ) — [E(X)] 2 . 


The moments most frequently used are those about the origin 
(c = 0) and about the mean [c = E(X)]. The number k is 
called the order of the moment. 

We can easily prove that the second moment about the point c 
is a minimum when c = E{X), that is, the second moment about 
the mean is the smallest possible second moment. The proof is 
as follows: 

E[{X - c) 2 ] - E{[X - E(X)] 2 } = E(X 2 ) - 2 cE(X) + c 2 
-E(X 2 ) + [E{X)Y = -2 cE(X) + c 2 + [#(X)] 2 

= [E(X) - c} 2 


which is positive unless c — E(X). 

4.6. Computation of Moments. It is more convenient to use 
frequencies than proportions in computing moments of samples 
(or finite populations). Suppose that we have the following 
sample data, each datum being the value of some member of 
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the sample: 0, 5, 2, 1, 0, 0, 3, 2, 2, 1, 4, 5, 4, 3, 1, 1, 5, 0, 2, 
3, 3, 4, 2, 3, 2. 

We collect these values into a frequency table as follows: 


X 

frequency (x) 

5 

3 

4 

3 

3 

5 

2 

6 

1 

4 

0 

4 


Instead of finding the proportion /(x) for each value and then 
taking m = 2x;/(x;) we observe that 

= £ a-. freq (x,) = Sx[freq (x)] 

The computation indicated by this formula can be conveniently 
performed in our table: 


# 

freq (a:) 

#[freq (V)] 

5 

3 

15 

4 

3 

12 

3 

5 

15 

2 

6 

12 

1 

4 

4 

0 

4 

0 


2freq (x) = 25 2x[freq (x)] = 58 


Similarly, for the variance we observe that 

]> - mm*) - ^ z/w] ! = zx * itr : q wi 

[2x[freq (x)]] 2 _ n2x 2 [ freq (x)] - {2x[freq (x)]} 2 

n J n 2 

We compute both the mean and the variance in the following 
table: 
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X 

X 2 

freq (#) 

#[freq (#)] 

rs 2 [freq (x)] 

5 

25 

3 

15 

75 

4 

16 

3 

12 

48 

3 

9 

5 

15 

45 

2 

4 

6 

12 

24 

1 

1 

4 

4 

4 

0 

0 

4 

0 

0 


2 25 58 196 


m = 5 %s = 2.32 
2 = 25(196) - (58V 
(25) 2 


2.4576 


4.7. Transformations of Values for Computational Purposes. 

Values are usually such that it is easier in computing the mean, 
variance, and other moments to transform the values, compute 
the desired moment, and then transform the computed moment 
into the value it would have had if the transformation had not 
been made. For example, suppose that we have the following 
sample data: 


X 

freq (x) 

11,071 

2 

11,067 

5 

11,063 

11 

11,059 

4 

11,055 

1 


To simplify computation we subtract 11,063 from each yalue 
and divide the difference by 4; that is, we transform * into y so 
that y = (x — ll,063)/4. We then compute the mean and 
variance of Y as shown in the following table : 


X 

y 

y 2 

freq (;//) 

2 /[freq {y)} 

*/ 2 [freq (y)] 

11,071 

2 

4 

2 

4 

8 

11,067 

l 

1 

5 

5 

5 

11,063 

0 

0 

11 

0 

0 

11,059 

-l 

1 

4 

-4 

4 

11,055 

-2 

4 

1 

-2 

4 



S 

23 

3 

21 


m y 


Hz = 0.1304 
(23) (21) - 9 


= 0.89603 


(23) 2 





60 BASIC STATISTICAL CONCEPTS 

To obtain m x and si we observe that, since x = 4 y + 11,063 
[we can obtain this either directly from the table or by solving 
the equation y = (x — ll,063)/4 for x], 

2x[freq (x)] _ 2(4 y + ll,063)freq (y) = 42y[fre qjy)j 
Wx = n “ n n 

+ 11,063 4m, + 11,063 

Therefore 

m x —■ 4(0.1304) T - 11,063 = 11,063.522 

Similarly, 

2(x — m x y -freq (x) 
s * = - n 

2[4 y + 11,063 - (4 m v + ll,063)] 2 freq (y) 

= n 

2(4 y — 4m„) 2 freq ( y) _ 162 (y — w^) 2 freq ( y ) _ jg s 2 
~ n n 

Therefore 

si = 16(0.89603) = 14.336 

Instead of 11,063 we could have chosen any other value to 
transform into 0, and we could have chosen any interval for 
steps between y’s. The student should be able to prove as an 
exercise that if x = ay + b, in which a and b are any con¬ 
stants, then m x = am y + b and si = a 2 s 2 and, further, that 
mom| (X) = o 3 mom| (y) and mom| (X) = a 4 mom| (F) and, more gen¬ 
erally, mom|,x) = o4mom! (F) . 

The number which is transformed into 0 is usually called an 
arbitrary origin. 


EXERCISES 

4 . 7 . 1 . Each of the following data is the length of time (in seconds) that it 
took 1 subject in a psychological experiment to respond to the turning on of a 
light by pressing a key: 


15 

.11 

.14 

.23 

.20 

.11 

.14 

.25 

10 

.15 

.17 

.16 

.11 

.13 

.16 

.15 

21 

.14 

.22 

.15 

15 

.21 

.12 

.13 

17 

.20 

,18 

,12 

,16 

,19 

.19 

,18 
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Find each of the following statistics, if it exists: 

a. The mode 

b. The median 

c. The arithmetic mean 

d. The variance 

e. The standard deviation 

4 . 7 . 2 . Each of the following data is the intensity of a visual stimulus, 
measured in millilamberts, which had to be reached before a subject could 
recognize a word when it was exposed for .10 sec: 


2.2 

2.4 

2.4 

2.9 

2.8 

2.5 

2.4 

2.3 

2.1 

2.2 

2.7 

2.6 

2.5 

2.3 

2.5 

2.7 

2.6 

2.5 

2.5 

2.7 

2.6 

2.8 

2.3 

2.5 

2.4 

2.2 

2.5 

2.4 

2.6 

2.6 

2.5 

2.5 

2.3 

2.6 

2.5 


Find the statistics as in Exercise 4.7.1, a to e. 

4 . 7 . 3 . Find a convenient computational formula for mom| (X) . 

4.8. Grouping of Data into Class Intervals. In most prac¬ 
tical applications of statistics the data are spread out over a 
great many different values, so that computation becomes very 
laborious, even using the method given in Sec. 4.7. By group¬ 
ing the data into class intervals and considering each datum as 
falling at the mid-point of its class interval, computation can be 
made much easier. For most data it has been found suitable to 
use between 10 and 20 classes. With fewer than 10 classes the 
calculation is usually not sufficiently accurate (obviously group¬ 
ing throws away information); with more than 20 classes calcula¬ 
tion is tedious. For example, suppose we have as data the 
number of cars passing over a certain bridge on each of 60 days 
selected at random from a large class of ordinary business days, 
and these numbers of cars range from 3,684 to 4,071, a range of 
387. Following convention we therefore use some number 
between 20 and 39 as our class interval. If we use 25 as class 
interval it is convenient to take as mid-points 3,675, 3,700, 3,725, 
. . . , 4,075. The intervals are therefore 3,663-3,687, 3,688- 
3,712, . . . , 4,063-4,087. Any number falling between 3,663 
and 3,687 (inclusive) is tabulated as 3,675, etc. We then pro¬ 
ceed as in Sec. 4.7. 
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Example. The following are IQ scores of 200 children. 


122 

110 

99 

112 

118 

88 

96 

103 

105 

98 

97 

101 

114 

108 

92 

115 

134 

107 

98 

103 

98 

115 

97 

129 

107 

102 

90 

105 

104 

131 

106 

112 

92 

91 

96 

94 

100 

105 

111 

100 

83 

127 

98 

92 

99 

111 

111 

115 

116 

124 

97 

98 

126 

156 

101 

129 

106 

120 

103 

118 

91 

102 

113 

100 

85 

113 

97 

104 

94 

102 

111 

101 

145 

108 

100 

95 

107 

119 

109 

97 

126 

94 

99 

95 

104 

99 

110 

103 

101 

99 

112 

113 

100 

117 

100 

104 

122 

111 

102 

111 

117 

109 

107 

100 

112 

100 

98 

96 

109 

108 

93 

105 

100 

97 

100 

109 

123 

110 

100 

92 

106 

108 

99 

107 

101 

101 

157 

94 

88 

110 

97 

103 

148 

114 

107 

102 

142 

100 

101 

119 

126 

109 

114 

96 

103 

161 

118 

107 

104 

92 

115 

124 

96 

111 

136 

94 

107 

116 

92 

102 

101 

95 

105 

103 

98 

108 

105 

113 

117 

113 

116 

116 

129 

104 

95 

97 

139 

110 

120 

105 

100 

99 

92 

95 

100 

107 

102 

121 

119 

128 

119 

112 

106 

100 

111 

99 

117 

108 

104 

117 
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The highest IQ is 161, the lowest 83, giving a range of 78. By taking an 
interval of length 5, we obtain about 16 intervals. It is convenient to take 
multiples of 5 as mid-points. Our grouped distribution together with a trans¬ 
formation and calculation of moments is shown in the following table: 


IQ (mid-point) 

y 

yt 

freq (y) 

y freq (y) 

y 2 freq (y) 

160 

li 

121 

1 

11 

121 

155 

10 

100 

2 

20 

200 

150 

9 

81 

1 

9 

81 

145 

8 

64 

1 

8 

64 

140 

7 

49 

2 

14 

98 

135 

6 

36 

2 

12 

72 

130 

5 

25 

5 

25 

125 

125 

4 

16 

7 

28 

114 

120 

3 

9 

13 

39 

117 

115 

2 

4 

27 

54 

108 

110 

1 

1 

21 

21 

21 

105 

0 

0 

34 

0 

0 

100 

-1 

1 

45 

-45 

45 

95 

-2 

4 

25 

-50 

100 

90 

-3 

9 

12 

-36 

108 

85 

— 4 

16 

2 

-8 

32 

m v 

2 

200 

= 102 Aoo = 0.51 
200(1,406) - (102) 2 

102 

1,406 

4 


(200) 

2 

= 6.7699 


From the table we see that x 

= By + 105. 

Thus, 



m x = 5m y + 105 = 107.55 
and si = 25s* = 169.2475. 




Chapter 5 

HYPERGEOMETRIC AND BINOMIAL 
DISTRIBUTIONS 


5.1. Hypergeometric Distributions. We have already con¬ 
sidered cases in which we draw a random sample of n f rom a popula¬ 
tion having N members, of which N 1 are of one kind and N — N 1 
are of another kind, for example, drawing a random sample of 4 
chips from a bowl containing 6 orange chips and 2 black chips. 
We saw that the expected sampling distribution (probability dis¬ 
tribution) in any such case is given by f(x) = C N x Clz x l /CZ in 
which x is the number of members in the sample which are of the 
first kind. Any distribution given by a rule of this form is a 
hypergeometric distribution . 

Definition. 1 A hypergeometric distribution is any distribu¬ 
tion given by a rule of the form f(x) = C^Cn-x 1 /Cn* The 
expected sampling distribution of the number of A’ s in a sample 
of size n from a population of size N such that N i are A’ s and the 
rest are non-A’s is a hypergeometric distribution. 

Example. The expected sampling distribution of the number of orange 
chips in a sample of 4 chips from a bowl containing 6 orange and 2 nonorange 
chips is given by f(x) = ClC\_JC\. 

5.2. Moments of a Hypergeometric Distribution. It can be 

shown that the first moment about the origin, that is, the 
arithmetic mean, is nNi/N , as follows. 

First we note that, because of the way in which the hyper- 

1 This is the special case of one variable; our definition can easily be generalized 
to any number of variables. 
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geometric expression was derived, 


Thus 


l 


C Ni/CW —Nl 
x ^n—x 


C 


N 

n 


= 1 


V riN\fW-N\ 
/ V x ^n—x 

x~0 


ffN _ pm+N-Ni 
N' x+n—x 


Let us write this result 


m 

X fibfik _ 


y = o 


in which m, b y and k are positive integers. Then 


n 

EiX) = £ 


„r<NipN-N\ 
'A'Kj x ^n—x 


x = 0 


N 1 


riN 
V n 




Nil 


CZ& xW-x)l 


S~YN —A 
N'n—x 


n 

l 


(.Ni - 1)! 


flN- 
N' n— 


'N—N l 
n—x 


Ni 


n — 1 


Y nNi-irfN-N 
/ ^ x— 1 ^n—x 


- 1=0 


Cn x % (X - 1 )\(Ni - X )! ~ C N n 

Note that if we let y = x — 1, m = n — 1, b = Ni — 1, and 

m 

k = N — Ni the summation is in the form ^ C"’C'i_ y . Thus 

y = 0 

we have 


Ni m _ Nin\(N - n)\ (N — 1)! nNi 

Cn n ~ 1 N! (n - 1)!(V^)1 “ N 

In a similar but slightly more complicated way it can be shown 
that the second moment about the mean, that is, the variance, is 
nNi(N - Ni)(N - n) 

' N\N - 1) 

5.3. Binomial Distributions. In addition to the case in which 
we draw a random sample from a finite population, we have also 
in Chap. 3 considered the case in which we draw a sample of 1, 
replace it, draw another sample of 1, replace it, etc., drawing as 
many times as we wish. We considered our final sequence of n 
draws as a random sample of size 1 from the population of all 
possible sequences of n draws. In the population containing 
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N i A ’s and A' 2 ( = N - Ni) non-A’s, the number of ways of 
drawing x A’s and then n - x non-A’s (replacing after each 
draw) is (Ni) x (N - Ni) n ~ x ] similarly, the number of ways of 
drawing x — 1 A’s and then a non-A and then another A and 
then n - x - 1 non A’s is also ( Ni)*(N - Ni) n ~ x ; there are as 
many such kinds of sequences as there are ways of choosing x 
positions out of n positions, that is, C”; therefore, the number of 
ways of drawing x A’s and n — x non-A’s in a sequence of n 
draws is C n x (Ni) x (N - Ni) n ~ x . As there are N n possible 
sequences in all, the probability of drawing a: A’s in a sequence of 


, . C:(N 1 ) X (N- 

n draws is- 


We summarize this result in 


the following theorem. 

Theorem 5.3. Consider the population consisting of all pos¬ 
sible sequences of n random draws (with replacement after each 
draw) from a set in which Ni are A’s and N — N\ are non-A’s, 
each sequence having as value the number of A’s drawn. 
The distribution of the population of sequences is given by 

_ C:(Ni) x (N - Ni) n ~ x 
J\ X ) ~ Jfn 

It is convenient to rewrite this formula as follows: 


fix) = Clv x q n ~ x 

in which p and q(= 1 — p) are the proportions of A’s and 
non-A’s. 

Examples 1. Suppose we plan to flip a penny 10 times. What is the proba¬ 
bility of obtaining 6 heads? In this case N = 2; N\ = 1; n = 10; x = 6. 

Thus /(6) = c\\y 2 yo. 

2 . One-third of an unknown number of chips in a bowl are white. What is 
the probability that 15 chips out of 35 drawn at random (replacing after each 
draw) will be white? Ans. Cll(}i) lb (%) 20 . 


Another way of obtaining the term C n x p x q n ~ x is by reasoning 
as follows. The probability of obtaining an A on each draw is 
p, and that of obtaining a non-A is q. Therefore, in accordance 
with the multiplication of probabilities of independent events, 
the probability of drawing a particular sequence containing 
a; A’s and n - x non-A’s is p x q n ~ x . This is the probability for 
only one sequence; there are CZ such sequences; therefore, in 
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accordance with the addition of probabilities of independent 
events, the probability of obtaining any sequence containing 
x A’s is Clp x q n ~ x . 

By the same line of reasoning we can easily show that C n x p x q n ~ x 
is the general term in the expansion of (p + q) n , which is the 
binomial expansion. Consider 

(p -f q) n = (p + q)i(p + q) 2 • • • (p + q) n 

If we multiply these n factors together, we will obtain pp • • • p 

+ pp ■ • ■ pq + pp • • • pqp + • ■ • + qq • ■ • qp + qq 
' ' ' q, such that each term is obtained by choosing either p or q 
from each of the n factors. [For example, 

(p + q) s = ppp + ppq + pqp + qpp + pqq + qpq 

+ qqp + qqq] 

We now ask, how many of these terms contain p as a factor x 
times and q as a factor n — x times? As we have one term for 
each sequence of n choices, obviously there are as many terms 
containing p as a factor x times as there are ways of choosing 
p x times, that is, of choosing x of the n factors enclosed in 
parentheses, that is, C”. Therefore the coefficient of p x q n ~ x is 

n 

Cl and we obtain (p + q) n = ^ Clp x q n ~~ x . If the reader is dis- 

x — 0 

turbed by the fact that this series writes the expansion backward, 

n 

he can instead write (p + q) n = ^ C r l_ x p n ~ x (f. 

x = 0 

Definition. A binomial distribution is any distribution given 
by a rule of the form f(x) = Clp x q n ~ x . 

The reader should not rely on this abstract definition of a 
binomial distribution; instead he should keep firmly in mind the 
type of situation in which binomial distributions arise, the type 
described in Theorem 5.3. This type of situation arises fre¬ 
quently in many fields of research. 

5.4. Moments of a Binomial Distribution. In the type of 
situation referred to in Theorem 5.3, what is E(X) ? Intuitively 
we feel that a sequence of n random draws should "on the 
average” contain the same proportion of A’s as there are in the 
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set, that is, Ni/N( = p). Thus the mean number of A’s in a 
sequence of n draws should be np. We can obtain the result 
E(X) = np for all binomial distributions in a rigorous manner 
as follows: 


n n 

E(X ) = ^ xC n x p x q n ~ x = 0 + ^ 




= Y -—- v x a n ~ x 

b x (x — 1)!(n — x)\^ q 

- V ( n ~ !) ! 

P *_4u ( x ~ -*•) ! ( n “ x ) ! P 

V _ 1) • _ 1~»— 1— (X— 1) 

- np x l =0 (x - l)![(n - 1) - (x - 1)]\ P q 

= np(p + q) n ' “ 1 
= np{ 1)” _1 = np Q.E.D. 

If the student does not see that 


- !)'[(“ -!)-(*- Oil V 4 V 4 

it will help him to rewrite the expression letting y = x — 1 and 
m — n — 1 . 

Similarly we can prove that the variance is npq. This proof 
involves a few algebraic tricks like the above, but followed step 
by step, it is actually very simple. One trick which may be 
new to the student is writing x 2 — x + x for x 2 . In higher 
mathematics such apparently useless expansions are often found 
extremely helpful. First we obtain E(X 2 ) and then we use the 
formula. Var = E(X 2 ) — [j?(W)] 2 . 

n n 

E(X 2 ) = £ x 2 C n x p x q n ~ x = £ (x 2 - x + x)C n x p x q n ~ x 


v~\ ft ! 

= l X(x - 1) x |( n _ x )\ V s 


q n x + 2 , x Clp x ^ 

x — 0 
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= 04 


0 + ^ x(x — 1) 


n\ 


^T^r^y] p x « n x + n p 


n 

n(n — 1 )p 2 ^ 

x = 2 


(n — 2)1 


% (a; - 2 )!(n ~ x)l 


+ np 


n — 4 

n(n — l)p 2 ^ 


(n — 2)1 


x-2 = 0 


(x - 2)![(» - 2) - (x - 2)]! 


p*-2 5 (»-2)-(*-2) + np 


= »(n - l)p 2 (p + §) re 2 + wp = »(» - l)p 2 (l) 4- np 

= n 2 p 2 — np 2 4- np = n 2 p 2 -j- np — np 2 = n 2 p 2 

+ np(l — p) = n 2p2 _j_ np q 
Var = n 2 p 2 4~ npq — {np) 2 = npq Q.E.D. 


EXERCISES 

5 . 4 . 1 . For each of the following statistics, indicate whether its expected 
sampling distribution is a hypergeometric distribution, a binomial distribu¬ 
tion, or neither. Write a rule for the distribution if it is hypergeometric or 
binomial. Write an expression for the mean and variance of each expected 
sampling distribution, if it is hypergeometric or binomial. 

a. The number of high-anxiety-level students in a sample of 20 drawn at 
random from a population of 600 students. 

b. The number of 6’s in 50 throws of an ordinary die. 

c. The number of 6’s and 5’s in 50 throws of an ordinary die. 

d. The number of 8’s in 50 throws of two ordinary dice. 

e. The number of correct answers on a true-false test of 100 items, for a 
student who does not understand any of the questions. 

/. The number of hearts drawn in 10 draws (without replacement) from an 
ordinary pack of 52 cards. 

g. The number of heads-tails-heads-tails-heads sequences in 50 sequences 
of 5 flips each of an ordinary coin. 

h. The number of black chips drawn on the third draw in the following 
situation. There are 3 black and 4 white chips in urn A and 5 black and 7 
white chips in urn B. A chip is drawn at random from A and placed into B. 
A chip is then drawn at random from B and placed into A. A chip is then 
drawn at random from A. 

i. The number 1 of black chips drawn on the second draw in the situation 
described in h. 

j. The mean (arithmetic) diameter of oranges in a sample of 20 taken at 
random from an orchard of 100,000 oranges, 
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5 . 4 . 2 . Follow instructions as in Exercise 5.4.1. 

a . The number of names of distinguished scientists in a sample of 20 differ¬ 
ent names taken at random from a biographical dictionary which contains the 
names of 20,000 scientists, of whom 500 are distinguished. 

b. The number of married students in a sample of 50 from a college con¬ 
taining both married and unmarried students. 

c. The number of a’s typed by a monkey hitting a typewriter keyboard 
30 times at random, there being 40 keys on the keyboard.. 

d. The number of correct answers to a multiple-choice question (with 
4 answers) in a sample of 25 papers if the students were answering at random. 

e. The number of defective tubes in a random sample of 100, if the factory 
output is .5 per cent defective. 

/. The number of draws until an ace is drawn from a pack of 52 cards of 
which 4 are aces, without replacing after each draw. 

g. The number of draws until an ace is drawn, replacing after each draw. 

h. The proportion of positive reactions to a vaccine given to n army recruits. 

5 . 4 . 3 . Find the mean and variance of a dichotomous population of which p 
members have the value 1 and g(= 1 - p) members have the value 0. 

5 . 4 . 4 . Prove that the variance of the expected sampling distribution of the 
mean of a sample drawn at random from the dichotomous population described 
in Exercise 5.4.3 is pq/n, in which n is the size of the sample and the popula¬ 
tion is so large that we can neglect the fact of depletion. 

5 . 4 . 5 . Prove that the variance of the expected sampling distribution of.the 
proportion of A ’s drawn from a dichotomous population (neglecting depletion) 
is pq/n by making use of the theorem mentioned in Sec. 4.7. 

5 . 4 . 6 . A radioactive substance emits electrons independently of one another, 
that is, whether an electron has or has not been emitted in the recent past has 
no influence on emissions in the present. Suppose that a careful record has 
been made of the times of electron emission for a given piece of radioactive 
material, and that during a given time interval L exactly N electrons have 
been emitted. Suppose we plan to sample the records by choosing at random 
a subinterval l of L. What is the probability that we shall find that 10 elec¬ 
trons have been emitted during U 

5.5. Binomial as Limit of Hypergeometric. Suppose that we 
draw a random sample of size n from a dichotomous population 
of size N such that N is very large in comparison with n. In this 
case it does not matter appreciably whether we draw a sample 
of size n or whether we draw a sequence of size n, replacing after 
each draw, because the depletion of the population by a few 
members does not appreciably affect what happens on the next 
draw. If, for example, N = 800,000 and n = 30, then no 
matter what the value of p (the proportion of A J s in the popula¬ 
tion), the sampling distribution of X (the number of A’ s in a 
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sampl e ) for samples of size n is approximately the same as the 
distribution of all possible sequences of n draws from a popula¬ 
tion m which the proportion of A’a is p, replacing after each 
draw. This means that, with a fixed sample of size n and with a 
fixed proportion p, as the population size N gets larger and 
larger the hypergeometric distribution approaches closer and 
closer to the binomial distribution with the same n and p. We 
can prove this in a rigorous manner as follows. 

. Given a dichotomous population of size N and with a propor¬ 
tion p of A’a, the sampling distribution of X, the number of 'a’s, 
for samples of size n is given by ’ 

(pN )! (N - pN )! 

fix) = croftr = xljpN - X)! 

Gn W! 

n\{N — n)\ 

= , , W ~ PN)\ n\(N - n)\ 

x\(pN — x) \ (n — x)!(N — pN — n + x )! W! ~ 

= __ ipN )! (N — pN)\ (N — n)\ 

x '-(n - x )! (pN - x )! (X - pN - n + x)! W!~ 

= ClpNipN - 1) • • • 

{pN — x + 1)(X — pN)(N — pN — !)••• 


(N - pN - n + x + 1) 
_ Qn pNjpN - 1 ) 


N{N - l) Tr - (N - n +1) 

_ (pN - x + 1) 

N{N - 1) ■ • • (X - x + 1) “ 

(Nj^ pN)jN -pN - 1 ) • • • (N -pN -n + x + 1) 

(N - x)(N - x - 1) • • • (X - n + i) - 

Noting that N-n + l= N-x-n + x + l, we can write 
the above expression as 


Qn PN pN 


N N - 1 


pN-x + l N-pN N-pN-1 

N — x + 1 N — x X — x — 1 

• • • N ~ PN — n + x + 1 
N — x — n + x + 1 
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Dividing the numerator and denominator of each term by A, we 
obtain 


flnP P ~ 1/ N 

* 1 1 - 1/A 


V 

1 


* i - v - 1 

A 1 - p P N . 

x — 1 i _ JL i _ x ^ 

1 A A 

1 - V - 


A 


n — x 


A 


A - 


x + n — x 
A 


All we have done so far is to rewrite the hypergeometric expres¬ 
sion; now observe that as A -> °° (approaches infinity) each of 

, 12 x — l x x 1 n — x — 1 q 

the terms ^ —jy~’ ’ ' ' AT 


Therefore 


each of the terms 


V 


A 


1 - 4 


A 


V - 


1 - 


x — 1 
A 

x — 1 
A 


p and 


each of the terms 


1 “ P. 


1 


V 


_1 

A 


V 


n — x 


A 


x_ 1 _ x + 1 
A A 


+ n — x — 1 


P 


A 


Therefore, the entire expression approaches C x p x q n ~ x . Q.E.D. 

This example should give the student some idea of what the 
mathematical statistician means when he says that one distribu¬ 
tion approaches another as a limiting form or that in the limit 
the distribution of a certain random variable is a certain dis¬ 
tribution. Unfortunately most of such proofs are far beyond 
the scope of this text; the above is one of the most elementary 
examples possible. 


EXERCISES 

5 . 5 . 1 . Find the binomial approximation for each of the following hyper- 
geometric distributions: 
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a. f(x) 

b. fix ) 


/~ylOO/~y200 
^ x ^50 —x 


' 50 
'7iVi/7Ai r --iVi 


/~yiVi/~yA' 
L' -r L/ n 


c* 


5 . 5 . 2 . Find a hypergeometric distribution for which each of the following 
binomial distributions could be approximations: 

а. f(x) = Ci 00 (.3)*(.7) 100 “* 

б . /(®) = C%p*r-* ' 

5 . 5 . 3 . If the proportion of Bostonians in favor of a certain bill is p, what is 
the probability that fewer than 10 in a random sample of 25 will favor the bill? 

5 . 5 . 4 . It is known that one-fifth of the students in a large university are 
members of ZXZXZ. A student committee is to be selected at random. How 
large a committee must be selected to make the probability of having fewer 
than two members of ZXZXZ on the committee less than .01? 


5.6. Fitting a Binomial to Sample Data. We may have theo¬ 
retical or empirical reasons for believing (or doubting) that a 
certain population has a binomial distribution. If a random 
sample from the population is available, we can examine the 
sample distribution to see how similar it is to a binomial. For 
this purpose we construct a binomial distribution which is like 
our sample distribution in some respects, and then we compare 
the two distributions, observed and constructed. Let the sam¬ 
ple distribution be [/(0),0], f/(l),l], . . . , [f(n),n], with mean m. 
Since in a binomial distribution E{X) = np, we set m = np, 
solve for p, and use n and p to construct a binomial distribution 
to compare with the sample. The constructed binomial dis¬ 
tribution will thus have the same mean as our sample, but if the 
distribution sampled is not a binomial, the constructed binomial 
distribution may differ greatly from the sample distribution in 
other respects. 


Examples. 1. A large random sample has the distribution (.01,0), (.09,1), 
(.39,2), (.51,3). Then m = 0(.01) + 1(.09) + 2(.39) + 3(.51) = 2.40; there¬ 
fore p = m/n = 2.40/3 = .80. Using p = .80, n = 3 we obtain the bino¬ 
mial distribution (.008,0), (.096,1), (.384,2), (.512,3), which agrees very 
closely with our sample distribution. 

2 . A large random sample has the distribution (.071,0), (.232,1), (.605,2), 
(.092,3). Then m = 1.718; p = .573. Using these two values we obtain the 
binomial distribution (.078,0), (.314,1), (.420,2), (.187,3), which departs con¬ 
siderably from our sample distribution. 
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In some cases we may hypothesize; for theoretical reasons, a value of p for 
the population, in which case we use the hypothesized value instead of that 
obtained from the sample mean in fitting a binomial to the sample. For exam¬ 
ple, if we toss a coin a at random” for 100 sequences of 4 tosses each, we would 
use n — 4, p — K in comparing our 100 observations with those theoretically 
expected. 

In Chap. 10 we shall discuss a method for determining whether the depar¬ 
ture of our sample distribution from that theoretically expected can reasonably 
be attributed to random sampling. 


EXERCISES 

5.6.1. Fit a binomial to the following sample data (each datum being the 


value of an observation): 
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5.6.2. Fit a binomial to the following sample data: 


i 

Value 

Frequency 

Value 

Frequency 

5 

1 

2 

98 

4 

13 

1 

100 

3 

47 

0 

41 
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POISSON DISTRIBUTIONS 


6.1. A Poisson as an Approximation to a Binomial. In many 
problems which can be solved by the binomial formula the com¬ 
putations become tedious; for example, computing C , g 6 (J-^) 8 (%) 67 
is quite a.task. Fortunately there are ways of approximating 
probabilities for a binomial distribution which involve very 
little computation; one of these can be used when n is large and p 
ts small; that is, the larger the size of n and the smaller the size 
of p, the better the approximation. We shall now derive this 
approximation by starting with the general term in the binomial 
expansion and letting n -» co and p -> 0 in such a way that their 
product np remains constant. 


C n x p x q n ~ x = 


n\ 


x\{n — x)l 




= n(» - 1) • • • (» - x + 1) 




Let a = np; then, replacing p by a/n we obtain 
— P -' " ‘ (n - x + 1) /a\ x / _ = n (n - 1) 

\ n J \ n) n n 

• • • (” - S + 1) a* A _ a\ n / _ a 
n z\ \ n) V n 


xl 


With x and a fixed, as n approaches infinity each of the terms 

( n ~~ 1) ( n ~~ 2) (n — x + 1) , 

n 7 n ; ; - - has 1 as its limit; thus 

their product has 1 as its limit. We state without proof that the 
term (1 — a/n) n has e a as a limit. 1 The term (1 — a/n)~~ x has 

bale The nUmeriCal ValuG ° f e is a PP roxi mately 2.718. It is the natural logarithmic 
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1 as a limit, as the term within parentheses has 1 as a limit and 
the exponent is fixed. Therefore, the limit of the total expres¬ 
sion is a x e~ a /x\. . 

The reason that the approximation is better the larger the size 

of n and the smaller the size of p can be seen from the derivation. 
We let > 00 as a (that is, rip) remained fixed; therefore, as 
n -* oo , p must approach 0. The reader may be surprised, how¬ 
ever, to find that the approximation is as good as it is with rela¬ 
tively small sizes of n and relatively large values of p. 

Examples. 1. Consider the binomial distribution corresponding to (>^ 
+ %)*. Since a = np = 1, the approximating Poisson distribution is given 
/’(a:) = l x e~ 1 /x\. Part of each distribution is given in Table 6.1.1. This 
approximation is not very good because n is not large nor is p small. 


Table 6.1.1. Poisson Approximation to Binomial for 
V = H» n ~ 3 


X 

f(x) (binomial) 

f(x) (Poisson approximation) 

0 

.296 

e -1 = .368 

1 

.444 

e -1 = .368 

2 

.222 

i ^ 

to 

\\ 

00 

CO 

3 

.037 

t-l = .061 

6 

------ 


2 The binomial distribution corresponding to (.1 + .9) 3 is approximated 
by ' f(x) = .3» e -.*/xl, as shown in Table 6.1.2. This approximation is much 


Table 6.1.2. Poisson Approximation to Binomial for 
V = Ho, n = 3 


X 

f{x) (binomial) 

fix) (Poisson approximation) 

0 

.729 

' .741 

1 

.243 

.222 

2 

.027 

.033 

3 

.001 

.003 


better because p is considerably smaller than in Example 1. 
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3. The bmomiaUlistribution corresponding to (V 3 + %) 10 i s approximated 
by fix ) = -, ag shown in TabJe G j :b Thig approximation ig g0me _ 

what better than that in Example 1 because n is larger. 


Table 6.1.3. Poisson Approximation to Binomial for 
P = H> n = 10 


X 

f(x) (binomial) 

f(x) (Poisson approximation) 

0 

.017 

.036 

1 

.086 

.119 

2 

.195 

.198 

3 

.260 

.220 

4 

.226 

.183 

5 

.137 

.122 

6 

.057 

.070 

7 

.016 

.032 

8 

.003 

.013 

9 

.000 

.004 

10 

.000 

.001 


4 . The binomial distribution corresponding to (.1 + .9) 10 can be approxi¬ 
mated by a Poisson distribution more closely than any of the distributions 
mentioned m the three examples above, 

. ^should be obvious to the student that by “good approxima¬ 
tion ” we mean that the differences between proportions are small 
fractions of 1. For example, in Example 3 the largest difference 
is .043 [for/(4)]; the difference may be, however, and usually is 
in the case of extreme values, a very large proportion of fix). 
For example,/(10) in Example 3 is .00002, whereas the approxi¬ 
mation is .00124, a very large error relative to/(10). 

EXERCISES 

6 . 1 . 1 . Write the expression for the Poisson approximations to the following 
binomial distributions: 

«• fix) = C 2 x °m 0 )*(i% 0 yo- X 
fix) = CxP x q n ~ r 

6 . 1 . 2 . Write the expressions for the binomial approximations and then the 
Poisson approximations to the following hypergeometric distributions: 
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a. fix) = 


^5,000/^1,000,000 
C x ^100 —x 

/'y 1,005,000 

C 100 


b. fix) 


syN isyN—*N i 
^x ^n—x 
■ Vtf ' 


6.2. Computation of a Poisson. The computation of a Pois¬ 
son distribution is greatly simplified by observing that 


fix + 1) 


a x+1 eT a a * e “ a _ ,/ n <x 

OrTld! " xTl JW x + 1 


In other words, we can calculate /(0), then /(l) /(0) 


a 

I 5 


f(2) = /(l) /(3) = /(2) etc. This kind of method, which 

is often used in mathematics, is called an iterative or recursive 

method. , 

6.3. Definition. So far we have considered ot x e a /x\ merely 

as a way of approximating We can > however, con- 

sider a x e~ a /x\ as giving the distribution of a purely mathematical 
or conceptual population of infinitely many members. 

Definition. A Poisson distribution is a distribution given by 
a rule of the form/O) = a x e~ a /x\, in which the values are 0 and 
all positive integers. 

Example. The rule fix) = 2*e~‘ i /x\ gives the Poisson distribution (e“ 2 , 0), 
(2e- 2 , 1), (2e~ 2 , 2), (4<r 2 /3, 3), . . . . 


Our definition assumes, of course, that ^ 1- This 

X — 0 

is a necessary consequence of the way in which we derived the 
Poisson; further, it follows immediately from the relation 


oo 



-> m 


xl 


which c is any positive number: 


00 



a x e ‘ 
x\ 


= e 


00 



of 

x\ 


e a e a 


e“ 

e a 


1 


6.4. Moments of a Poisson Distribution. It is a necessary 
consequence of the way in which we derived the Poisson that the 
mean is a and the variance is also a (because as n-> °° np 
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remains fixed; thus p —» 0 and q —> 1; thus npq —> np = a). 
However, we can easily give an independent derivation. First 
we extend the definition of the arithmetic mean E(X) and the 
variance E{[X - E{X)] 2 } to distributions having infinitely 

many values by defining E(X) as T xf(x') and E{[X - E{X)] 2 } 

all x 

CO 

as y [X — E(X)]-f(x). Remembering that e a = y a ,-> we have 

A £ 0 x\ 


E(X) = l 


s = 0 


XI 




\l ® 1 


j (* — 1)! 


_ v a v aT 

4, gVTj! = ae ~" l.Tl- ae ~" (e ‘> 


a 


b(X’) = 


x = 0 
x n —a 


x — 0 


= 0 
ifX n a 


XI 


V f are a . s> a x e~~ a 

Z o x( - x x \ + Z o x - 0 + 0 


= a 2 e~~ a X fjT + a = a 2 e~“(e a ) + a = a 2 + a 

x — 0 

Var = E(X 2 ) - [X(X)] 2 = a 2 + a - a 2 = a Q.E.D. 1 

Thus both the mean and variance of a Poisson distribution are 
equal to a. 

We can fit a Poisson to sample data in a way similar to that in 
which we fit a binomial, that is, by computing E(X) for the 
sample and then setting a = E(X). Alternately, we could 
compute the variance of the sample and set a = Var; we state 
without proof that it is best in fitting a Poisson to use E(X) as an 
estimate of a rather than Var or any combination of E(X) an d 
Var. 


1 Actually in this last step we are making use of a theorem which we have 
proved only for the finite case, but which also holds for all cases. 
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6.5. Poisson Distributions as Exact. Thus far we have dis¬ 
cussed Poisson distributions only as approximations to binomial 
distributions. There are numerous situations, however, in 
which Poisson distributions are theoretically exact. These 
situations are those in which events are distributed in time (or 
space) in such a way that not only will information about the 
position of one event not help to predict the position of any 
other specific event, but even information about how many 
events occur in one interval of time (or region of space) does not 
help to predict how many events occur in any other interval. 
These two conditions are stated briefly by saying that the events 
are distributed individually and collectively at random. 

In Exercise 5.4.6 we mentioned the emission of electrons by a 
radioactive substance as being such that whether an electron 
has been emitted in the recent past has nothing to do with 
whether one will be emitted in the present. In that exercise we 
brought this situation within the scope of a binomial by provid¬ 
ing the information that N electrons had been emitted during a 
certain interval and then asking what we should expect if we 
examine part of the records. This maneuver was highly arti¬ 
ficial, however. If we are interested in predicting not what we 
shall find in the records but what the substance will itself do in 
a specified interval, obviously we cannot know N , that is, the 
total number of electrons emitted during the interval L from 
which our specified interval l is drawn. If, however, we know, 
from previous observations (or from theoretical considerations), 
the mean value of N, we can find the probability that x elec¬ 
trons will be emitted during the interval l, and this probability 
is given exactly not by the binomial but by the Poisson. As the 
expectation for the number emitted during L is N, the expecta¬ 
tion for the number emitted during l is IN/L. The probability 
that x electrons will be emitted is given by 

Unlike the case in which the Poisson is used as an approxima¬ 
tion, no restrictions need be placed on the relative sizes of Z, L, or 
N. Furthermore, the Poisson rule applies to the total interval 
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(that is, l = L) as well as to any subinterval; that is, the prob¬ 
ability that x electrons will be emitted during L is given by 


m 


N x e~ N 

x\ 


The proof that the Poisson rule is exact for the type of situa¬ 
tion just described is beyond the scope of this text. 1 


EXERCISES 

6.5.1. The mean number of cars passing over a toll bridge during the time 
interval 3 to 4 p.m. is 600. The cars pass individually and collectively at ran¬ 
dom. Write an expression for the probability that not more than 2 cars will 
pass during the one-minute interval 3:39 to 3:40. 

6.5.2. During the time interval 4:45 to 5 p.m., telephone calls in a certain 
exchange are placed individually and collectively at random, the average 
being 1,000. Write an expression for the probability that fewer than 1,500 
calls will be placed during this interval. 

6.5.3. Approximate, by means of a Poisson, the probability that a random 
sample of 60 bearings will contain 4 or fewer defectives, if the proportion of 
defective bearings being turned out is Mo- 

6.5.4. Records were kept at each of 500 observation stations of the number 
of flying saucers spotted during the year. Fit a Poisson to the following data 
which were obtained: 


Saucers 

Frequency 

0 

75 

1 

152 

2 

147 

3 

98 

4 

41 

5 

16 

6 

5 


1 A proof and further discussion of the Poisson is given by Fry (7). 
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7.1. Definition. Up to this point we have discussed only 
populations having as values either qualitative categories or 
numbers which are “spread out” along the number axis in the 
sense that no value is a point about which infinitely many values 
cluster (we shall define this kind of spreading out precisely); 
furthermore, a nonzero proportion of the population falls at each 
value. It may seem to the student that these conditions should 
hold for almost all distributions. It turns out, however, that 
there is another very important class of distributions, which we 
shall discuss in the following chapters. The distributions which 
we have already discussed form a special class known as discrete 
distributions. 

Definition. A discrete population is a population such that a 
nonzero proportion of the population falls at each value and each 
finite interval of the number axis contains at most a finite 
number of values. 

De fini tion. A discrete distribution is the distribution of a dis¬ 
crete population, i.e. the class of all ordered pairs such that the 
second member of each pair is a value and the first member of 
each pair is the (nonzero) proportion of the population having 
the second member as value. 1 

Examples. 1. All distributions of finite populations. Obviously any finite 
population satisfies the conditions necessary and sufficient for a discrete popu¬ 
lation. (Note that there is nothing in the definition requiring the values to 
be numbers; it merely excludes the case, to be discussed later, in which a finite 
interval of the number axis contains infinitely many values.) 

1 This is the fr.f. We could have defined “distribution” in terms of the c.d.f.; 
however, the c.d.f. does not always exist. 
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2 . Any Poisson distribution, for example, the distribution given by 


in which x = 0, 1, 2, ... . 

7.2. The Logic of Statistical Inference for Discrete Distribu¬ 
tions. In Sec. 3.7 a short statement was made about the logic of 
statistical inference; we shall now give a somewhat more sophis¬ 
ticated statement. We start with a population Q with a dis¬ 
crete distribution and consider drawing a sample O n at random. 
If we define a statistic z, then z will have a sampling distribu¬ 
tion, depending upon n and the distribution of Q. If we make 
hypotheses about the distribution of fi, we can derive hypothet¬ 
ical sampling distributions of z, one for each hypothesis. In 
each hypothetical sampling distribution we can mark out a 
region of values w such that probability zew (the probability that 
z will have a value in co) is low, if the hypothetical sampling dis¬ 
tribution is in fact the sampling distribution. We then draw a 
sample O n , calculate z, and reject all hypotheses for which the 
value of z we obtained is in the region co. In most of the cases of 
statistical inference which we have discussed, we have known the 
functional form of the population (for example, that it is 
dichotomous), and we have needed to hypothesize only values of 
one or more parameters. In such cases rejection of a hypothesis 
about the distribution of ft amounts to a rejection of a hypothe¬ 
sis about the value of one or more of its parameters. 
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8.1. Continuous Populations. Suppose for the moment that 
we could construct a pointer that could land with equal likeli¬ 
hood on any point between 0 and 1. Suppose further that we 
plan to spin this pointer a large number of times. What is the 
expected proportion of spins that will result in the pointer’s land¬ 
ing at exactly y? Upon reflection, we see that the only answer 
that we can give is zero. For suppose that we should choose 
some very small proportion, say .0000001, as our answer; clearly 
we would, to be consistent, have to give this same answer for any 
other point. It can easily be shown that, if we should do this, 
the sum of all the expected proportions would be greater than 1 
(in fact, would be infinite), thus demonstrating that our choice 
must have been incorrect. To prove that no matter how small a 
proportion e is chosen for each point we will obtain a sum greater 
than 1, we need merely consider the sequence of points 1, y 2 , M, 
y, H, ■ • • (which of course does not by any means exhaust all 
the points on the interval 0-1). Take any integer N greater 
than 1/e and then take the first N members of the above 
sequence; their expected proportions will have a sum greater 
than 1, showing that we must take 0 as the expected proportion. 

Although we cannot construct a pointer 1 that can land on any 
point between 0 and 1, this imaginary case provides a way of 
thinking about conceptual or mathematical entities called con¬ 
tinuous populations and continuous distributions. 

8.2. Proportion Density. Let us compare our imaginary 
pointer that can land anywhere between 0 and 1 with a pointer 

1 Even the best of pointers can of course be read only to a limited precision, that 
is, there is a finite number of discriminably different points on the dial. 
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that can land anywhere between 0 and 2. Although in both 
cases the expected proportion at each value is zero, for a large 
number of spins the results are clustered more thickly or densely 
m the case of the 0-1 pointer, so that for a region of a given 
size,, for example, the region between and H, a greater pro¬ 
portion of the spins will be expected to lie in that region in the 
case of the 0-1 pointer than in the case of the 0-2 pointer. It is 
this notion of proportion density or probability density that we 
shall use m defining continuous distribution so that we can dis¬ 
tinguish one continuous distribution from another. 

. Before we can define proportion density we must define cumula¬ 
tive distribution function (c.d.f.) for a continuous population. 

Definition. The cumulative distribution function (c.d.f.) of a 
continuous population is the class of all pairs such that the 
second member of each pair is a value and the first member is the 
proportion of the population having values less than or equal to 
that value. This definition is identical with that for a c.d.f. of a 
discrete population given in Chap. 3, although in the continuous 
case we can leave out the words “or equal to” because a zero 
proportion lies at each value. Note that in each case F (lowest 
value) = 0 and F (highest value) = 1. Further, if x x < Xi , then 

^(*i) - No function which fails to satisfy these condi¬ 

tions can be a c.d.f. 

Now let us see what the expected c.d.f. is for a large number of 
random spins of a pointer which can land on any point between 

0 and 1. In order to do this we must introduce another 
definition. 

Definition. If a certain kind of event can occur at any point 
of a region of size L, and if each point is judged as likely to be the 
locus of the event as any other point, then in the occurrence of a 
large number of events of this kind, the expected proportion 
which will occur in a subregion of size K is K/L. 

According to our definition, the expected proportion of spins 
resulting in the 0-1 pointer’s landing between 0 and any point x 
is simply x/l = x. In other words, the expected F(x) = x when 
0 ^ a; < 1. Similarly, the expected F for spins of the pointer 
which can land anywhere between 0 and 2 is given by Fix) = x/‘2 
when 0 < x < 2. 
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We can also imagine a pointer ranging between 0 and 1 with 
variable friction so that the expected c.d.f. is given by Fix) = x 2 
when 0 < x < 1. 

We might even consider such c.d.f.’s as those given by 


N X 2 + X 

F(x) - 2 

when 0 < x < 

N x 3 + 2x 2 + 5x 

F(x) --g- 

when 0 < x < 

Fix) = xe x ~ l 

when 0 < x < 

X log (X + 1) 
b {x) - log 2 

when 0 < x < 


Now we are in a position to build up the concept of proportion 
density. In the first place, the c.d.f. (F) increases as we increase 
x, and the proportion density at a point x x is merely the rate of 
increase of F at that point x x . To see exactly what is meant by 
rate of increase, consider taking a very small region of length Ax 
extending from x x to x x + Ax. Call the proportion of the 
population falling in this region A F(x). The ratio AF(x)/Ax 
represents a kind of average proportion density in this region, 
and the limit 1 of AF(x)/Ax as Ax ^ 0 is the proportion density at 
the point x x . 

Definition. The proportion density at a point x x is the rate of 
increase of F(x) with respect to x at the point x X) that is, it is the 
limit of A F (x) /Ax as Ax —> 0. 

The limit of A F{x)/Ax as Ax —>• 0 is also called the derivative of 
the c.d.f. with respect to x and is written dF(x)/dx, the symbol d 
being used to indicate the approach to 0. 

It is very easy to find dF(x)/dx when F is given by any rule of 
the form 

, ; _ a n x n + dn- iff” -1 + • • • + dlX + Clo 
t { ' x) ~ b m x m + 6 m _iX m_1 + • • • + b x x + bo 

in which the a’s and b’s are constants and at least one b is not 
zero. (A function given by a rule of this form is called an 

1 It is assumed that the student has an intuitive concept of limit; for a precise 
definition, see Appendix B. 
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“algebraic function.”) One effective process is the four-step 
rule, as follows: 

First Step. In F(x) replace £ by x + Ax and calculate the 
value of F{x) + AF(x). 

Second Step. Subtract F(x) from F(x') + AF(x') to obtain 
AF(x). 

Third Step. Divide A F(x) by As: to obtain AF(x)/Ax. 

Fourth Step. Find the limit of AF{x)/Ax by setting Ax = 0. 

Examples. 1. F(x) = ^±-2 when 0 < x < 1. This is in the form 
described above with a 2 = 1, ftl = 1, b 0 = 2, and all the other a’s and b’s = 0. 


First step: 

F(x) + AF(x) = ^ x ^ x 4~ x + Ax 


Second step: 
F( 

Third step: 
Fourth step: 


__ z 2 + x ( 2x Ax + Ax + (Ax) 2 
2 2 ~~" 


F(x) + A F(x) - F(x ) = A F(x) = Xc + ^ + (Ax ) - 

Zi 


A F(x) = 2x + 1 + Ax 
Ax 2 


lim = * + i 

Ax—>0 Ax 2 ' 2 


when 0 < x < 1 


or 


dF{x) , 1 

dx ^ 2 

2. F(s) - ^i- gg .. + 5^ when ()<,,<!, 

O 

F(x) + AjP(x) = + Ax) 3 + %( x + Ax) 2 + 5(x + Ax) 

8 

AF(x) = Ax ~b 3x(Ax) 2 + ( Ax ) 3 + 4a; Ax -f- 2(Ax) 2 + 5 Ax 

8 ~ ~ 

A F(x) _ 3a; 2 + 3x Ax + (Ax) 2 + 4x + 2 Ax + 5 
Ax 8 

lim AF & - 3 ^ 2 + 4* + 5 dP(x) 3x 2 + 4x + 5 , _ _ 

A Ax 8- 01 ~Tx -8- when 0 ^ ^ 1 

Transcendental functions, that is, functions which cannot be 
given by a rule of the form described, such as those given by 

F{x) = xe x ~ x when 0 < x < 1, F(x) = - - og / a: ^ when 0 < 

log 2 — 
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x < 1, Fix) = sin x when 0 < x < 90°, require more subtle pro¬ 
cedures; fortunately, however, differentiation has been reduced 
to simple rules which can be found in any calculus textbook; 1 
for example, if F{x) = cx k , in which c and k are constants, then 
dFix)/dx = kcx k ~ l . Further, if Fix) = Gix) + Hix), in which 
G and H are functions, then dFix)/dx = dG(x)/dx + dH(x)/dx. 
Using these two rules jointly, we can immediately obtain the 
answers for the examples just given; the details are left to the 
student. 

We shall denote proportion density not only by dF{x)/dx but 
also by f(x), that is, fix) = dF(x)/dx. Whereas in the discrete 
case f(x) means the proportion having the value x, in the con¬ 
tinuous case it means the proportion (or probability) density. 

8.3. Continuous Distributions. We can now define “con¬ 
tinuous distribution” in the following way. 

Definition. A continuous distribution is the distribution of a 
continuous population, that is, the class of pairs such that the 
second member of each pair is a value, and the first member of 
the pair is the proportion density (probability density) for that 
value; in other words, the class of all pairs of the form [/Or), x].* 
In the continuous case we can never actually list the class of 
pairs; we must rely upon a rule. 

Now suppose that we are given the frequency function / and 
asked to find the cumulative distribution function F. Obvi¬ 
ously, since f(x) = dFix)/dx, then F is simply a function such 
that dF{x)/dx = fix) and also such that F (highest value) = 1. 
To find F, therefore, we “guess” and then check our answer by 
differentiating; or, even better, we look in a table showing, for a 
given function, another function of which the first is the deriva¬ 
tive. Such a function is called a “primitive function” of the 
first. F is therefore a primitive function of /. 

Examples. 1. If / is given by f(x) = 1 when 0 < x < 1, what is FI We 
see from our table of primitive functions in Appendix C that a function of x 
whose derivative is a constant a is ax plus a constant c. We have therefore 

1 Many of these “rules” are given in Appendix C; they are actually theorems, 
which can be derived from the definition of the derivative. 

* This definition is in terms of the frequency function. We could have used 
the c.d.f., as in the continuous case the c.d.f. always exists. 
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= i + ^, When w 1 T X ~ L Als °’ since / is a Probability density function, 

ml = n o / b. c. d. e. * * * * * * (1) = 1 + c = !• Therefore c = 0. The answer is 

t {x) = x when 0 < x < 1. 

2. If / is given by f(x) = — ^ 5 - when 1 < x < 3, what is FI Prom our 
table we see that if /(*) = + a „ .,*..-1 + • • • + a ,x + a 0> then 


F(x) 


a«x n+ 1 a n ^x n 

n + 1 + n + * * ' + a 0 x + c 


In this case we have n — 2, a 2 


% 2 > fli — 0 , a Q — % 2 . Therefore 


Also, since/is a probability density function, F( 3) = 1. Therefore, c = % . 
[Since F (lowest value) must be equal to 0, it would have been easier in this 
case to set F( 1) = 0 and solve for c. Regardless of which is used, a check 
should always be made to see whether F for each extreme value has the 
proper value, either 0 or 1. If such is not the case, then either we have made 

a,n error m finding F or the/ with which we started was not in fact a probability 
density function.] J 


EXERCISES 

8.3.1. We plan to spin a pointer that can land on any point between .01700 

; • ; i a J ld • 8250 ° • • • ’ a11 P° ints bein 8 equally likely. Find the probability 
that the point will land 

a. between .020 and .025 inclusive 


b. between .020 and .025, excluding the end points 

c. on the point .750 

d. on the interval .030-.035 or the interval .150—. 165 

e. on a point greater than .750 

/. on a point not greater than .750 

8.3.2. Suppose we put variable friction on a pointer so that Fix) — 4x 2 
when 0 < x < .5. Find 

a. the probability density at the point .250 

b. the probability that the pointer will land between .30 and .45 

c. the probability that the pointer will land on a point not less than .40 

8.3.3. Suppose we put variable friction on a pointer running from 0 to 2 so 
that the probability density at a given point is directly proportional to the dis¬ 
tance of that point from 2. Find 

a. an expression giving the probability density 

b. the probability density at the point 1 

c. an expression giving the c.d.f. 

d. the probability that the pointer will land between 1 and 1.5 

€ - the P robabi lity that the pointer will land on a point at least as great as .7 
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8.3.4. Suppose we put variable friction on a pointer running from 0 to 3 so 
that the probability density at a given point is directly proportional to the 
distance of that point from 0. Find 

a. an expression giving the probability density 

b. the probability density at the point 2 

c. an expression giving the c.d.f. 

d. the probability that the pointer will land between 1.5 and 2.0 

8.3.5. Suppose the probability density is given by fix) = ax 2 — H when 
1 < x < 2. Find 

a. f(x) (that is, find the value of a) 

b. Fix) 

c. the probability between 1.5 and 2.0 

8.3.6. Suppose the probability density is given by f(x) = x-H when 
a < x < 2. Find 

a. Fix) 

b. the value of a „. , 

8.3.7. Suppose the c.d.f. is given by Fix) = log x when 1 < .r < e. Kind 

a. fix) 

b. the probability between 1.5 and 2.0 

8.3.8. Suppose Fix) = e x — 1 when 0 < x < log 2. Find 

a. fix) 

b. the probability between .3 and .5 , 

8.3.9. Suppose the probability density is given by fix) = ax + * when 

1 < x < 2. Find 

a. fix) 

b. Fix) 

8.4. Definite Integrals. Let us consider another way m 
which we can obtain F from /. Suppose that we know / and we 
wish to find the proportion of the population falling between two 
definite limits, a and b (with a less than b), that is, we wish to 
find F(b) - F(a). We can proceed as follows. 

Divide the interval a to b into n equal subintervals, each of 
length ( b — a)/n, which we shall call Ax. Within each sub¬ 
interval, choose any value of /(x) as the estimated average pro¬ 
portion density for that interval; call this estimate/(*»)• Then 
for each subinterval i the approximate proportion (or prob¬ 
ability) falling within i is the estimated average proportion 
density times the length of the interval, that is, /(*») Ax. For 
the entire interval, therefore, the approximate proportion 

y j(xi) Ax. The approximation will tend to become better 
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and better as n gets larger and larger, and, as a matter of fact, 
the approximation will approach the correct value F(b) - F(a ) 

n 

as n approaches infinity, that is, lim Y /(x,-) Ax = F(b) - F(a). 

n ~* °° i = l 

Our approximation process can be illustrated by plotting/(x) 
as the ordinate versus x as the abscissa. The estimated prob¬ 
ability f(xi) Ax is the rectangle of height /(x f ) and length Ax. 
As Ax —> 0, the “corners” which extend beyond the curve 
V ~ become smaller and smaller; as a matter of fact, 


y 



Fig. 8.4.1. The approximation S/fo) Ax to the amount of probability between 
a and b. 

lim 2f(xi) Ax can be used as a definition of the area under the 
curve y — /(x). 

n 

We abbreviate the expression “lim Y f{x t ) Ax taken over 

»-»» ^ 

the interval a to 6” by J a /(x) dx, the sign J indicating that we 

are taking the limit of a sum (it comes from an old form of the 
letter S) and dx indicating the fact that as n -> <x, Ax -> 0. 

fb 

Ja ^ x called the definite integral of f of x between the 
limits a and b. The sign J is called the integral sign. 

In summary we state that the definite integral of a probability 
density function between the limits a and b gives the amount of 
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probability between those limits. It is the area under the curve 
given by the density function. 

If we take for a the lowest value in the distribution (or the 
lower limit of the values) we have 


j* fix) dx = F{b) - F(a) = F(b) - 0 = F(J>) 

If the values extend to — <», as in the case of a type of 
distribution we shall describe in the next chapter, 1 then we 

take the limit of f* fix) dx as a-> -<*>. We abbreviate 

“ lim [ b f(x) dx” by P fix) dx. Therefore, for a distribution 

a->- » Ja °° ~ h 

whose values extend to - co we have /_ f{x) dx = F(b). If the 

values extend also to <*>, we can take the limit of J_ m fix) dx as 

b-+ oo, calling this limit f(x) dx. This limit is of course 

always 1 for any probability density function /. 

As we saw before, lim 2/(yh) Ax = F(b) — F(a), or 


J h f(x) dx = F(b) — F(a) 

This statement says that the definite integral of / of x is the 
difference between two definite values of the primitive function 
F(x). 

For any integrable functions g and h and any constant c , 

J R l c g( x ) + h( x )\ dx = c J R g(x) dx + J R h(x) dx 

in which R indicates that we are integrating over some range. 
This important property is summed up in the statement that the 
integral is a linear operator . 

8.5. Indefinite Integrals. Suppose that we replace the con¬ 
stant b with the variable x in the definite integral fix) dx. 
We then have J x f(x) dx = F(x) — F{a ). As —F(a) is a con¬ 
stant, which will be different for each different choice of a, we can 
write f* f( x ) dx = F(x) + c, or more briefly ff(x) dx = F(x) + c. 
This is called the indefinite integral of f of x. If we differentiate 

1 Mathematicians usually extend the range of values in any distribution to 
include the entire number axis by setting f(x) = 0 for all numbers outside what is 
called in this text the range of values; we follow this convention in the Appendix. 
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the indefinite integral we obtain the probability density function, 

that is, d JlM_±A = dF(x) = 

dx dx d ^ 1(!l c ore indefinite 

integral is a primitive function. 1 

Hnfortunately integration cannot be reduced to rules as 
thoroughly as differentiation, and in courses in the calculus con¬ 
siderable time is spent in learning techniques of integration 

,irr;T ymd 1 n r (aS WeU aS S ° me definite) inte g ra]s are 
given m tables m calculus texts. This text contains a few of 

these in Appendix C. 

Examples. 1. Given/(*) = 4*'/15 when 1 < * < 2, what is F? We find 


m tables of indefinite integrals that f ax n dx = 


7»n+1 


FW - / i 


n + 1 


x 3 , x 4 . 

15 dz ~ 15 + c 


~b c. Therefore 


But smce/iE » a probability density function, it must be the case that F( 2) = ] 
therefore F( 2 ) = 1 + i % 6 + c and c = - Hs . Therefore j 


F(x) = 2- - J_ 

15 15 


(x — 4) log 3 wllen 5 ^ x < 7, what is F ? From tables, 
/ ^ = a lo S i x + V) + c. Note that a = 1 /log 3, b = -4. 




But F(7) = 1. 


4) log 3 dx log 3 log 4) + c = F(x) 


lbfi l0e(7 -4)+r-l ja| + c-l 

••• F(x) = 

log 3 


c = 0 


EXERCISES 

8.5.1. Let a distribution be given by F(x) = e 2 


■ b when < x < 3 


Find 2 2 

a. 5 

b. the probability in the interval ,5-.6 

1 This is the fundamental theorem of the calculus w.. i,„, „ . , 

we hoje tlml „„ a i 

proof is given in the Appendix, Sec. B.8. plausible. A 
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8.5.2. Let a distribution be given by /Or) = ex - 1 when 1 < * < d + V5)/2. 
Find 


a . c 

b. F(x) 

c. the probability in the interval 1-1.1 

8.5.3. Let a distribution be given by f(x) 
a . F(x) 


2/x when a < x < e 2 . Find 


c. the probability in the interval e 1,75 e 1 - 

8.5.4. Suppose we have a pointer that can land between 
variable friction so that/Or) = b cos x. Find 


0° and 90° and with 


Cl. Fix') it, a ro u 

b the probability that the pointer will land between 45 and 75 
8 5 5 Which of the following rules cannot give distributions for the intei v 


1 < x < 2? 

a. fix) = x 2 + 2x — 4 

b. fix) = ax' 1 + 2x — 4 

c. Fix) = x 

d. Fix) = cx 

e. Fix) = x 2 — 3 
/. Fix) - ax 2 — 3 

g. fix) = log ® 

h. fix) = a log x 

i. f ix) = log x + c 

j. Fix) = x 3 - 2x 2 - 1 
lc. Fix) = x 3 — cx - 1 


8.6. A Physical Model for Distributions. It is often useful to 
conceptualize a distribution in the following way. Suppose we 
take a unit mass of soft clay and spread it on the values so that m 
case the distribution is discrete each value has that proportion 
of the unit mass which is paired with it m the distribution ( r. .), 
and in case the distribution is continuous each value has the cor¬ 
responding density. In other words, in the discrete case we 
would place a certain amount of clay (probability) at each va ue , 
in the continuous case we would spread out the clay so that 
although at a given point there is no clay (which is perfectly 
reasonable since a point has zero volume) yet the density of clay 
at that point is the proportion or probability density paired with 

it in the distribution. , , 

8.7. Moments of Continuous Distributions. The expected 

value or expectation of a member drawn at random from a con- 
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tmuous population is defined as E(X) = j R xf(x) dx, in which R 

is the total range of values for which f(x) is defined. This 

expected value is also called the arithmetic mean (see Sec. 4.2). 

Examples. 1. If/(*) = x when 0 < < V2, then 

E(X) = f 0 V2 x*dx = 

2. If f(x) = ]/x when 1 < x < e, then E(X) = f‘ dx = e - 1. 

We also define the expected value of a function g as 

= J R g(x)f(x) dx 

Note that if c is a constant and g and h are two functions of X, 
then it follows from the linear nature of the integral that 
E[cg(X) + h{X)] = cE[g{x)] + E[h(X)]. Thus E is a linear 
operator. Note also that E{c ) = c. It was stated earlier that 
E has these properties in the finite case, and we state without 
proof that it has these properties in the discrete case in general, 
as well as in the continuous case. ? 

Definition. The /cth moment about the point c is the expected 
value of the &th power deviation from the point c, that is, 

mom? ^ E{X - c) k ^ f R (x- c) h f{x) dx 

Examples. 1. If k = 1, c = 0, we have the arithmetic mean. 

2. If k = 2, c — E(X), we have the variance. 


EXERCISES 

8.7.1. Find the mathematical expectation and variance of X for each of the 
distributions given in Exercises 8.3.1 to 8.3.9. 

8.7.2. Find the mathematical expectation and variance of X for each of the 
distributions given in Exercises 8.5.2 to 8.5.4. 

8.7.3. It f(x) =-- ce* when - » < x < 0, find 

a. the mean of X 

b. the variance of X 

8.7.4. If F(x) = c cos x when 90° < x < 180°, find 

a. momj 

b. mom|, (X) 



Chapter 9 

NORMAL DISTRIBUTIONS 


9.1. A Normal Distribution as an Approximation to a Binomial 
Distribution. In Sec. 5.3 we discussed binomial distributions 
and the kind of situation in which a binomial distribution 
arises. A binomial distribution was defined as one given by 
— C n x p x q n ~ x , in which n is an integer, p lies between zero and 
one; q = 1 — p, and x takes only the values 0, 1, . . • , n. We 
now state without proof the following theorem. 

Theorem 9.1. An approximation to C n x p x q n x is 

_1_g— (H»pa) np)* 

V2r Vnpq 


or, more precisely, 

lim C n x p x q n ~ x 


T (Hnpq) (x—np ) 2 

- — -— t/ 

V2i rv npq 


’ n* 

Unfortunately the proof of this theorem is beyond the scope of 
this book. 1 The reader should be aware of the following points, 

however: _ . - 

1. The approximation is better the larger the size of n, as 

illustrated in Table 9.1.1 for the case p = x = np. 

2. The approximation is better the closer to H th e value ol 
p, as illustrated in Table 9.1.2 for the case n = 100 and * = np. 

3. The approximation is better in terms of per cent error the 
smaller the absolute size of x - np, with fixed n and p; in other 
words, the closer x is to the mean, the better the approximation, 
as illustrated in Table 9.1.3 for the case n = 100, p - / 2 - 

1 A proof is given by Feller (4). 
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Table 9.1.1. Normal Approximation to Binomial for p = 

x — np 



C n x p x q n ~ x 

Approximation 

Difference 

Per cent error 

n = 10 
x = 5 

.2461 

.2523 

.0062 

2.50 

II II 

Cnr h-i 
O O 
O 

.0796 

.0798 

.0002 

0.25 

n = 1,000 
x = 500 

.0252 

.0252 

.0000 

0.00 


Table 9.1.2. Normal Approximation to Binomial for n = 100, 

x = np 



C n x p x q n x 

Approximation 

Difference 

Per cent error 

II II 

o ^ 

o 

.1318 

.1330 

.0012 

0.91 

P = H 
x = 25 

.0918 

.0921 

.0003 

0.33 

P = H 
x = 50 

.0796 

.0798 

.0002 

0.25 


Table 9.1.3. Normal Approximation to Binomial for n = 100, 

P = H 


x — np 

G n x p r q n -* 

Approximation 

Difference 

Per cent error 

4 

.000023 

.000026 

.000003 

13.04 

2 

.01084 

.01080 

.00004 

0.37 

0 

.0796 

.0798 

.0002 

0.25 
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In the discussion above, ._ 1 has been 

V2tt Vnpq 

considered a probability, defined only for x = 0, 1, . . . , n. 
We can, however, consider this quantity a probability density, 
defined for all values of x from - co to + °° . Considered in this 
way, the approximation for Clp x q n ~ x becomes 

+ . 5 1 

/ _ X g— (Vznpq) (y~np ) 2 ^y 

Jx-.b \/2irVnpq 

that is, the area under the curve between x - .5 and x + .5. 
There is a great advantage in doing this (in addition to an 
increase in accuracy); for example, suppose that we wish to 
20 

approximate £ C n x p x q n ~ x . We need merely take 

X = 10 



_ ± _ Q— Oinpq) (x—np) 

■\Z2tt Vnpq 


dx 


as our approximation. As a matter of fact, we do not even have 
to perform this integration but instead can use tables, as will be 
demonstrated in Sec. 9.4. 

When __ = i _ e -('An P q) (x-np)- considered a probability 

V27r Vnpq 

density, we have a normal distribution, a definition of which 
follows in the next section. 

9.2. Definition of Normal Distributions 

Definition. A normal distribution is any continuous dis¬ 
tribution given by f(x) = (1/V2ir ii)er (V ^ ( *~ c)2 in which c and k 
are constants and x ranges over all numbers. 

We already know, by the definition of “distribution,” that 
for a function / to be a distribution it must be the case that 

J R f(%) dx = 1, in which R indicates that we are integrating over 

the range for which the probability density is defined. It must 
also be the case that f(x) is nonnegative for all values of x in R. 
Since <r ( ' A,t;L)(x ~ c> ‘‘ is positive for any values of c and k, our total 
expression will be positive if k is positive. Furthermore, we 
state without proof 1 that for any value of c and any positive 
1 A proof is given in the Appendix, Sec. B.21. 
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1 

V27T k 


e -{vm (z-c) 2 


- 1 


Therefore, a rule of this form will always give a distribution, pro¬ 
vided only that k is positive. 

Examples. 1. Since ->/ npq and np are constants (for any given binomial 
distribution), we see, by setting k 2 = npq and c = np, that 

f(x) = - j=~ - e -(Hnpg)(x-np )2 

V 27rv npq 

gives a normal distribution, as stated in the last paragraph of Sec. 9.1. 

2. With k = 1 and c = 0we obtain the special cas ef(x) = (l/V&r) e~ (x2/2 \ 
This special case is so important that it is sometimes referred to as the normal 
distribution (see Sec. 9.4). 

Note that (x — c) 2 = [ — (£ — c)] 2 ; for example, if x is 5 units 
above c, so that x - c = 5, we obtain the same value for f(x) 
that we obtain if a; is 5 units below c, so that x — c = — 5. this 
means that fix) is symmetrical about c. 

Furthermore, it is easy to prove that f(x) has its maximum 
value at c as follows: 

p-Mk‘)(.x-c)‘ _L_ 

e — e (Kifc 2 )(z-c) J 


which is less than 1 unless x = c, because e to any positive power 
is greater than 1. 

Further, if x = c, then e ~ {y>ht)ix ~ c)i = e° = 1. Therefore, 

1 . 1 


v / 27 t k 


e u > 


k 


e c) s when x ^ c 


The foregoing discussion implies that when plotted a normal 
distribution will slope symmetrically away from its mode. 


EXERCISES 

9.2.1. Which of the following rules can give normal distributions? For each 
one which can, find c and k. 
a. f{x) = e~ xl 
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c. f(x) = 3e~ ST(x ~ s)1 

d. f{x) = 3e- 9lr ^+ 3 ) 2 

e. f(x) = be ' x ' [ 

f f(,&) — g—bx-+6bz —9& 

9.2.2. Which of the following rules can give normal distributions? For each 
such rule, find the values of c and k. 

a. f(x) = be~ 2x 2 

b. f(x) = be-**- 2 *- 1 

c. f(x) = ire ~ dx2 

d. f{x) — b7re ~ x2 

9 . 2 . 3 . Write an approximation for the probability that of 147 chips drawn 
at random from a bowl containing 5 black chips and 2 white chips, replacing 
after each draw, between 80 and 120 inclusive will be black. 

9.2.4. Write an approximation for the probability that of 100 chips drawn 
at random from a bowl containing a very large number of chips, of which 14 are 
white, between 20 and 30 inclusive will be white. 

9 . 2 . 5 . Write an approximation to the probability that not more than 150 
clocks in a shipment of 1,000 will be defective, if the factory is shipping out 
10 per cent defectives on the average. 

9.2.6. Write an approximation to the probability that more than 80 per 
cent of a sample of 200 randomly selected students will be from urban areas, 
if the percentage of urban students in the entire population is 75 per cent. 

9.3. Moments of Normal Distributions. The definition of 
the arithmetic mean (the first moment about zero) of a continu¬ 
ous distribution was E(X) = J R xf(x) dx . In the case of a 
normal distribution we have therefore 


E(X) - 


x 7 j— — e c)2 dx 

- « v27r k 


We state without proof 1 that 


/: 


a / 27 r k 


dx = C 


Therefore the constant c is the arithmetic mean E(X). 
Actually, this follows from the fact demonstrated in Sec. 9.2 that 
a normal distribution is symmetrical about c, because the center 
of any symmetrical distribution is the arithmetic mean. 2 

1 A proof is given in the Appendix, Sec. B.21. 

2 Unless the mean does not exist at all. See Sec. 11.1 for an example. 
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We state also without proof 1 that 

E[{X — c) 2 ] S= / (x — c ) 2 — e -0AW) (x-c ) 2 _ £2 

J ~ » V27T A 

Therefore the constant A 2 is the variance and A is the standard 
deviation. 

Because of these results, the equation for a normal distribu- 
tion is usually written (1/V2 t using the conven¬ 

tional symbols p and a for the mean and standard deviation 
respectively. 

We shall abbreviate the expression (1/V2x by 

simply <j>(p,<r*). To say, for example, that X. is distributed 
according to ^>(5,49) means that f(x) = (1/V2ir 7 ) e -(H 8 )u-B)» > 
We shall also use the abbreviation N{p,<r 2 ) for the longer 
expression a normal distribution with mean p and variance 
a 2 ,” or, depending upon the context, “a normal population with 
mean p and variance a 2 .” 

# :t should be obvious from the equation for a normal distribu¬ 
tion that a normal distribution is completely specified if we know 
its mean and variance. In other words, there cannot be two 
normal distributions which have the same mean and variance 
and yet which differ in some other parameter. 

We state without proof 1 that in any normal distribution the 
amount of probability between the mean p and a point b standard 
deviations from p depends only upon 6, not upon the size of p or a. 
This is stated formally as follows: 

f m dx = j'^ 2+bn (ptp-^af) dx 

for any values of Ml , p 2 , a h <r 2 , and b. It so happens that 
4>(p,a 2 ) dx = .3413 for 6 = 1, .4772 for 6 = 2, and .4987 for 
^ = 3- These probabilities are indicated in Fig. 9.3.1. 

The .probability density (height of the curve) varies, of course, 
with different values of cr, even when 6 is the same. This is the 

1 A proof is given in the Appendix, Sec. B.21. 
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reason that we cannot put a numerical scale on the ordinate m 
Fig. 9.3.1. 



9.4. The Special Case c (=jj.) = 0 and k(=<r) =1. In Sec. 

4.7 the student was asked to prove that if a variate X is trans¬ 
formed by a linear transformation into a variate Y, that is, if 
Hi = aXi + b, in which a and b are constants, then m y = am x + b 
and si = a 2 s 2 , in which m means the arithmetic mean and s 
means the standard deviation. 

This result, which can also be obtained for continuous dis¬ 
tributions (the student should prove this as an exercise), 
implies that we can take a normally distributed variate X and by 
a suitable transformation obtain a new variate Y such that the 
mean of Y is 0 and the variance is 1. To find values of a and 1 
which accomplish this, simply set an x + b = 0 and aV* = 1. 
Then a 2 = l/al or a = l/<r x (taking positive root only) and 

b — (tfix “ fix/& x* 

We state without proof 1 that a linear transformation leaves 
the functional form of a normal distribution unchanged; there¬ 
fore, that Y will also be normally distributed. 

In summary we state the following theorem. 

Theorem 9.4. If X is distributed according to (j>(fi x ,<rl) and 



fJ'x _ _Mx 

& X &x 


then Y is distributed according to <£(0,1). 

1 A proof is given in the Appendix, Sec. B. 21. 
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Because of the great usefulness of normal distributions (see 
Sec. 9.5 below) and because any normally distributed variate can 
be so easily transformed into a variate having the distribution 
iV(0,l), extensive tables showing W(0,1) can be found in most 
books on statistics. In this book tables can be found in Appen¬ 
dix D. 

The relation of N(0,1) to any other normal distribution can 
be understood by examining the equation whereby we trans¬ 
formed Xi into y t , that is, y { = (x { - n x )/or x . The numerator is 
the distance between the value Xi and the mean p x . When we 
divide this distance by the standard deviation <r x , we obtain the 
number of standard deviations a given value is from the mean of the 
distribution. (This statement holds regardless of whether X is 
normally distributed or not.) Theorem 9.4 therefore implies 
that the probability that a normal variate X will take a value 
which is b standard deviations or more from the mean is simply 

f b “ <K0,1) dx, which is given in the tables. The probability 
that X will have a value b standard deviations or more in the 
negative direction is 0(0,1) dx. Similarly, the probability 
that x will lie within h standard deviations of the mean is 
Jl b 0(0, i) dx. More generally, the probability that x will lie 
between b standard deviations and d standard deviations from 
the mean, where b < d, is 0(0,1) dx. 

We now see how we can use tables to approximate any desired 
portion of a binomial distribution, as stated in Sec. 9.1. Sup¬ 
pose, for example, that n = 100 and p = }{ and we wish 

40 

to approximate £ Cl°VA) x (%) m ~ x . Since np = 33M and 

_ #==25 

Vnpq = *■%, we find the number of <r’s that 24.5 is from the 
mean by taking 


24H ~ S3H = z3i 

x % 2 % 


-2.65 


Similarly 40.5 is 2.15 <r’s from the mean. Therefore our approxi- 
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mation is P' 5 <£(0,1) dx, which by our tables we find to be 

]- 2.65 7 

b 

.9802. More generally, to approximate ^ C'*p x q n ~ x we take 

x = a 

& + >5 — np 

Ja-X^np ^(0)1) which we find from the tables. 

\/ npq 

0(0,1) and 0(0,100) are illustrated in Fig, 9.4.1. 

fix) 



fix) 



Fig. 9.4.1. Normal distributions with 0 mean and variances of 1 and 100 respec¬ 
tively. 

EXERCISES 

9.4.1. Find ju, and cr for each of the following distributions: 

a. fix) — 3e~ 9Tx2 

1 _ _ E _ i 

b. f{x) =- 7 = e 4 2 4 

2 V7T 
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9.4.2. Find ju and <r for each of the following distributions: 

a. f(x) = - e *• 

7T 

n a; 2 3a; 9 

b. f(x) =-£ =e -8+T-8 

V 7T 

9.4.3. Find the probability between — 1 and 2 for each of the distributions 
in Exercise 9.4.1. 

9.4.4. Find the probability between 1 and 3 for each of the distributions in 
Exercise 9.4.2. 

9.4.5. For the distribution 2V(2,9), find the proportion 

a. between 3 and 4 

b. less than 3 

9.4.6. If scores on an aptitude test have approximately the distribution 
jV( 100 ; 100), find the proportion of scores 

a . between 115 and 130 

b . greater than 70 

9.4.7. Evaluate the approximation obtained in Exercise 9.2.3. 

9.4.8. Evaluate the approximation obtained in Exercise 9.2.4. 

9.4.9. Find the probability that a randomly selected member of a normally 
distributed population will lie within 2 standard deviations of the mean. 

9.4.10. Find the probability whose approximation was asked for in Exer¬ 
cise 9.2.5. 

9.5. Tests of Hypotheses and Confidence Intervals. Suppose 
we know X to be distributed normally and we know the variance 
<r 2 but not the mean fx. We hypothesize a value ix H . How can 
we test our hypothesis by drawing a member of the population 
at random? The student should, on the basis of our discussion 
of tests of hypotheses in Chap. 3, be able to answer this question 
for himself, but we shall describe the procedure. 

According to our hypothesis, .95 of the population lie within 
1.96 cr’s (whose value we know) of fi B , .96 of the population lie 
within 2.06 <r’s of /x„, etc. [We obtain these values from our 
tables of 2V(0,1).] Therefore, we draw a member of the popula¬ 
tion at random. If this member lies too far from ix H , then we 
reject our hypothesis at the appropriate confidence level. In 
other words, we compute (x — h h )/<t (the number of a ’s that x 
lies from fx H ), which, if our hypothesis were correct, would have 
the distribution N(0 ,1). If our result departs too far from 
expectation, then we reject our hypothesis at the appropriate 
confidence level. 
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Example. We know that a certain population has the distribution N(n, 25). 
We wish to test the hypothesis that jjl = 100. We draw a member of the 
population at random and observe its value to be 115. We compute 

x — jin ___ 115 — 100 ^ 3 QQ 
5 5 

Thus, referring to our tables of iV(0,1), we reject our hypothesis at the .003 
level of confidence. 

Suppose we wish to test the hypothesis that n has the value 
}jl h or greater. In this case, we decide before drawing our sample 
that we shall make our test in one direction only. In other 
words, we are so confident that x is going to have a value smaller 
than y H that we give up the possibility of making a test in case 
x turns out to be greater than fi B . From tables of V(0,1) we see 
that if m has the value y n then the probability that (x - y. H )/<r 
will have a value less than —1.65 is .05, less than —1.75 is .04, 
etc. If our result departs too far from expectation, we can 
reject our hypothesis at the indicated level of confidence. 

Example. We know that X is distributed according to iV(/r,100). We 
wish to test the hypothesis that n is at least 80 (80 or greater). We draw a 
member at random and find its value to be 62. We compute 

62 ~ -- = -1.80 
10 

Referring to our table, we see that we can reject our hypothesis at the .04 level 
of confidence. If we had obtained a value of 98, however, we could not have 
rejected the hypothesis that fi = 80. (Why not?) 

The reader may feel that drawing a sample of size 1 is not 
going to give us a very powerful test of our hypothesis. He is 
quite right. By taking larger samples we can increase the power 
of our test tremendously. 

We now state without proof 1 the following theorem. 

Theorem 9.5.1. Let Xi be distributed according to 4>(yi,<rl) , 
Xi be distributed according to . . . , X n be dis¬ 

tributed according to 4>(y n ,< rjj). Then if all the X’s are chosen 

n 

independently, the sum ^ Xi is distributed according to 

i — 1 

1 A proof is given in the Appendix, Sec. B.21. 
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Hi and <r 2 = ^ of. (The sum of n 

*”i i=l 

mutually independent normally distributed variates is normally 
distributed with a mean equal to the sum of the means and a 
variance equal to the sum of the variances.) 

Corollary. Consider a random sample of size n from a popula¬ 
tion having the distribution N{ix,<r 2 ). Then 
tribution N(njj,,n<r 2 ). 

2 Xi 

Remembering that wi (the sample mean) = -—-— and using 
the relation stated at the beginning of Sec. 9.4, we have 


i 

i = 1 


Xi has the dis- 


n 

4>(fx y cr 2 ) y where M = I 


Hm 


Ms 

n 



and 


0% 

ncr 2 

_ of 

n 2 

n 2 

n 


In summary we state the following theorem. 

Theorem 9.5.2. Consider a random sample of size n drawn 
from a population having the distribution N(h,<t 2 ). Then the 
mean of the sample m will have the distribution N(h, <r 2 /n). 

Corollary. -• will have the distribution iV(O.l). 

Thus we see that by taking a sample of size n we multiply our 
ratio (in — p.u) I a by Vn, thus greatly increasing the power of 
our test. 


Example. We know that a certain population is distributed according to 
N Ox,36). We wish to test the hypothesis that p = 20. We draw a sample of 
9 and find that wi = 23.6. We compute 

m - fi H 23.6 - 20 

— )./- = --57-■ = 1-80 

cr/vw 

Therefore we cannot reject our hypothesis. 

Using the same kind of reasoning as above we can easily obtain 
a confidence interval for ju, if we know <r 2 . 

First we consider a sample of size 1. We know that 
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This means that prob (a < x < c) = J a </>(m,ct 2 ) dx , in which a 

and c are any two constants. Let us choose a and c so that they 
are b standard deviations from jjl, that is, let a = jx — bcr and 
c = ft + ba. We then have 

prob (jjl — ba < x < jjl + be) = j*_ ha dx 

But prob (jjl — be < x < jjl T ba) is just the probability that x 
will lie within b standard deviations of jjl. Now obviously if £ 
lies within b cr 7 s of jjl, then jjl lies within b a ’s of x and vice versa. 
That is, if jjl — ba < x < jjl + ba then x — ba < jj < x + ba and 
vice versa. 1 Therefore 

prob (x - ba < jjl < x + ba) = prob (jx - ba < x < jx + ba) 

= /T* < K / x >° r2 ) dx 

Jii — bv 

which, by our discussion in Sec. 9.4, is simply j* b 0(0,1) dx. 
In summary we state the following theorem. 

Theorem 9.5.3. Consider a random variate X from N(ix,(r 2 ). 

Then prob (x + ba includes /a) = J_ h 0(0,1) dx. 

We can therefore find the p confidence interval by finding from 
our tables of iV(0,l) the value of b for which 

j b _ b 0(0,1) dx = 1 - p 


Example. We know that a certain population is distributed according to 
<j>(fjL,25). We wish to find the .01 confidence interval for ji. We draw a mem¬ 
ber of the population at random and find its value to be 75. Referring to 
tables of iV(0,l) we see that our confidence interval is 


75 + 2.58 (5) = 75 ± 12.90 


If we had decided beforehand that our interval would have a lower bound 
only, we should have found our interval to be 75 — 2.33 (5) to <*>. 


Similarly, using Theorem 9.5.2, that j(m) — </>(/x, (r 2 /n), we 


can prove that 


prob ± 


b -7= includes jjl 
V n 


)=f- 


<£(0,1) dx. 


1 The equivalence can be proved formally by adding - x to each term of 
the first inequality (adding equals to unequals leaves them unequal) and then 
multiplying through by —1 (which changes the direction of the inequality signs). 
The same procedure changes the second into the first. 
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Example. We know a certain population to be distributed according to 
0 (m> 81). We wish to find the .05 confidence interval. Drawing a sample of 4, 
we find m to be 50. Our confidence interval is 50 ± L96(%) = 50 ± 8.82. 

Remember that we can speak of the probability that m ± b — 

v n 

includes p only before we draw our sample. Afterward we are 
more or less confident that it includes p. 

EXERCISES 

9.5.1. If X is distributed according to 0(5,4), what is the probability that a 
member of the population chosen at random will have a value 

а. in the interval 4-6? 

б. in the interval 3-6? 

c . greater than 6? 

9.5.2. If X is distributed according to <£(1,9), what is the probability that 

a sample of 9 will have a mean 

a. in the interval 0-2? 

b. in the interval ( —1)-2? 

c. less than 3? 

9.5.3. Suppose X is normally distributed with known variance of 225. Find 
the value a sample of 1 would have to have in order to reject (at the .05 level 
of confidence) the hypothesis 

a. that the mean is 30 

b. that the mean is at least 30 

c. that the mean is less than 100 

9.5.4. If X is distributed according to 0Gu,25), show how, with a sample of 1, 
you can obtain a .05 confidence interval for jjl. 

9.5.5. If/(z) = <t>(fi x , 4) and f(y) = <£Gu„,9), what is/(* + y)t (Assume X 
and Y are chosen independently.) 

9.5.6. Show how, with a sample of 5 drawn from N(p, 4), you can test the 
hypothesis that 

a. jjl = 0 

b. jjl < 1 

9.5.7. The following is a sample of 4 drawn from N(jjl, 16): (2.5, Xi), (3,X 2 ), 
(2,X 3 ), (2.5,X 4 ). Find the .01 confidence interval for jjl based on this sample. 

9.5.8. The mean of a sample of 9 drawn from JV(m, 100) is 38. Find the .02 
confidence interval for jjl with 

a. both upper and lower bound 

b. lower bound only 

c. upper bound only 

9.5.9. In Exercise 9.5.3, find the power of the test of each hypothesis if jjl 
is actually 
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a. 50 

5 . 10 

c. 110 . 

9.5.10. In Exercise 9.5.6, find the power of the test of each hypothesis 

(requiring the .01 level for rejection) if fi is actually 
a. 2 

6. 3 

9.5.11. The weight of a certain canned product is distributed approximately 
normally with a variance of 25 oz. Find the size sample necessary to esti¬ 
mate, at the .01 level of confidence, the mean weight within 1 oz. 

9.6. The Central-limit Theorem. We now come to one of the 

most remarkable and useful theorems in the field of statistics, 
which we state without proof. 1 

Central-limit Theorem. Consider any population, discrete 
or continuous, with finite mean \x and finite variance <r 2 . The 
sampling distribution of the mean of a random sample of size 
n , will approach iVX/x, (r 2 /n) as u approaches infinity, more 
precisely, 

lim prob (a < m < b) = f <Z>(/x, c 2 /n) dx 

ft—> eo J 

in which a and b are any constants. 

Corollary. 

lim prob (a < m J =- < b) — [ 4>( 0,1) d x 

This theorem implies, among other things, that we can make 
up a hypothesis about the size of /x, draw a sample, and, pro¬ 
vided that n is large, test our hypothesis by using the tables for 
2V(0,1). To make this test we simply compute the mean of the 
sample m and the variance s 2 ; then, using s 2 as our estimate 2 of 

1 Proofs are given by Cramer (3), Wilks (12), Hoel (8), and Mood (9). 

2 It should seem reasonable to the student to use s 2 as an estimate of <r 2 . This 
estimate is justified by the fact that, for any e > 0, lim prob (|s 2 - <r 2 | > e) - 0. 

ft—> 00 

An estimate having this property is called consistent. Although consistent, s 2 is 
biased in that its mean value for any n is not o- 2 but <r 2 (n \)/n (a proof is given in 

the Appendix, Sec. B.20). An unbiased estimate of <r 2 is s 2 n/(n - 1), which is 
also consistent, and for this reason the sample variance is sometimes defined as 

S(a; — m) 2 r sss 2 n ^ n _ p 0 r large n the factor n/(n — 1) is negligible. 

n — 1 
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<r 2 , compute — . in which jx H is the hypothesized value of jx. 

s/Vn 

If our hypothesis is correct, that is, if jx H = jx, then — _ 7 / will 

s/Vn 

have the approximate distribution iV(0,l). If our actual result 
departs too far from expectation, then we reject our hypothesis 
at the indicated confidence level. The reader may raise the 
objection that our estimate of the population variance may be 
incorrect. This is true. However, we used a good estimate, 
and there is no reason for believing the variance estimate to be 
seriously wrong; therefore, the best working hypothesis is that 
our hypothesis about the population mean is incorrect. Fur¬ 
thermore, with large samples s 2 nearly always has a very small 
percentage error. 

Examples. 1. We wish to test the hypothesis that the mean of a popula¬ 
tion is 5. We draw a sample of size 100 and find that the mean of our sample 
m is 20 and the population variance estimate <f 2 (= s 2 ) is 2,500. We then 
compute 

? _ m ~~ fJLn — 20 — 5 __ 15 _ 

y ~ d/Vn ~ a/2,500/100 “ V25 ~ 

If our hypothesis were correct, we see by our table of N(0,1) that the 
probability of obtaining a value of 3 or greater would have been less than 
.0014. Similarly, the probability of obtaining a value of —3 or smaller would 
have been less than .0014. Therefore, the probability of obtaining this great 
a deviation from our expected value, 0, would have been less than .003. 
Therefore, we reject our hypothesis at the .003 level of confidence. Further, 
we can assert at the .003 level of confidence that the population mean is 
greater than 5, because a hypothesized mean less than 5 would have given us 
an even greater deviation and enabled us to reject that hypothesis at an even 
lower level of confidence. 

2 . We wish to test the hypothesis that the mean of a population is 100 or 
less . We draw a sample of 100 and find that m is 117 and <r 2 is 10,000. Then 

^ m - fin ^ 117 - 100 = 1 

V 6/Vn 100/V100 

We are justified in rejecting our hypothesis at the .05 level of confidence, 
because before drawing our sample we assumed that our sample mean would 
be greater than 100; in other words, we decided to make our test in a certain 
direction only . If m had turned out to be 83, we could not reject the hypothe¬ 
sis that fx = 100 at the .05 level of confidence, because doing so would imply 
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that, before drawing our sample, the probability of rejecting our hypothesis 
at the .05 level of confidence would have actually been .10, if /x were actually 
100. Our previous decision to make our test in one direction only, however, 
made the probability of rejecting our hypothesis at the .05 level of confidence 
only .05 (if the mean jjl were 100), as it should be. If m had turned out to be 
much smaller than 100, say 70, we might be very suspicious of the hypothesis 
that fx — 100, but the proper level of confidence at which to reject this hypothe¬ 
sis would require considerable thought. 

It should be kept in mind that in computing m we are 

a/Vn 

finding the approximate number of standard deviations our 
sample mean departs from the population mean in the expected 
sampling distribution of the mean 1 because p H is the hypothetical 
mean of the expected sampling distribution of the mean and 
&/Vn is our estimate of the standard deviation of the sampling 
distribution of the mean. If this is not perfectly clear to the 
student he should reread the discussion of the relation between 
N(y,a 2 ) and iV(0,1) given in Sec. 9.4. 

The statistic m ~ is called the “critical ratio.” 

&/Vn 

Definition. The standard error of the mean a m is the standard 
deviation of the sampling distribution of the mean. 

In sampling any population, a m = cr/Un) therefore, the 

critical ratio may be written m A — • 

<Tm 

How large must n be for </>(0,l) dx to be a close enough 

approximation to prob (a < < M and for s 2 to be a close 

\ a/Vn / 

enough approximation to a 2 to justify the use of the critical 
ratio in testing hypotheses and finding confidence intervals? 
There is no general answer to this question, for the closeness of 
approximation depends not only upon n but also upon the dis¬ 
tribution of the population sampled. A dichotomous popula¬ 
tion is the one least favorable to the generation of a normal 
sampling distribution, and if neither proportion is less than .20, 
a quite reasonable approximation is obtained with a sample of 
about 30. If one of the proportions is about .10, n should be 
about 100 for a reasonable fit. For a population whose dis- 
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tribution is roughly bell-shaped, a sample of less than 15 can give 
a close approximation. Obviously any general statement about 
the size of n has to be vague; many research workers adopt 30 as 
a kind of rule-of-thumb definition of “large.” For many, per¬ 
haps most, nondichotomous populations encountered in prac¬ 
tice, 30 is ample for a good approximation. 

The central-limit theorem is of considerably greater generality 
than the student will probably recognize at first. For example, 
it enables one to test hypotheses about proportions. Consider a 
dichotomous population with two qualitative values instead of 
quantitative ones. Call the values A and B, and call the pro¬ 
portion of A’s in the population p. If we transform the qualita¬ 
tive values A and B into the quantities 1 and 0 respectively, then 
the mean p of the transformed population is simply p. Further¬ 
more, the mean m of a sample with transformed values is simply 
p , the proportion of A’s in the sample. According to the 
central-limit theorem, limprob {a <p' <b) = £ <f>(p ! /n) dx. 

For the transformed dichotomous population with mean p the 
variance is 

E[(X — ^) 2 ] = p(i _ pY + (1 — p)(o — pY = p(\ _ p ) 

Note that q = 1 - p is the proportion of B’s. Therefore the 
variance can be written pq. The central-limit theorem becomes 

P r °t> (a < p' < b) = 4>(p, pq/n) dx 

.•• re lim prob [a < {p' - p)/V^ffi < b] = £ 0(0,1) dx 

The testing of a hypothesis about the mean of the trans¬ 
formed population is thus a test of a hypothesis about the pro¬ 
portion of A s! Note that in this test we do not need to estimate 

<r 2 from the sample; our hypothesis about p provides us with our 
estimate of a 2 . 

9.7. Confidence Interval for a Mean of a Nonnormal Popula¬ 
tion. On the basis of the central-limit theorem we can easily 
derive a theorem which enables us, with a large sample, to obtain 
a confidence interval for the mean of any population whatsoever 
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(as long as it has finite mean and finite variance) with very little 
computation. 

The central-limit theorem states 

lim prob (a < ——-V < c\ = f <£(0,1) dx 
n-> « \ a/Vn / Ja 

Reasoning in exactly the same way that we did in Sec. 9.5, we 
obtain 

lim prob ^ m + b includes = J b 

Example. We wish to find the .01 confidence interval for the mean of a 
population with a completely unknown distribution (except that we know that 
it has finite mean and finite variance). We draw a large sample, say n = 100, 
and compute the sample mean and the estimate of the population variance 
s 2 . We find that m = 60 and ,s 2 = 81. We then compute m ± b(s/ V«). 
which in this case is 60 + 2.58 Olo)- 

An important special case is that of finding a confidence 
interval for the proportion of members having a specified value 
in a dichotomous population. As in Sec. 9.6, we transform the 
specified value into 1 and the other values into 0. The propor¬ 
tion p having the specified value is then the mean of the trans¬ 
formed population. As before we have 

lim prob (a < ? , = < c\ = f <£(0,1) dx 

«->» \ Vpq/n / 

in which p' is the proportion for a sample of size n. Then 
lim prob (j/ + b Vpq/n includes p) = f_ b <£(0,1) dx 

n—> « 

To find the confidence limits for p we take 

v' + V T i~r 
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and solve for the values of p. Note that these two equations 
become 




Squaring, we obtain 


x = 


( p ') # - 2 pp' + p * = = ^0 - 

n n 

(ir +') p ’ ~ (£ + 2p ') v + 

The solution to a quadratic equation 
— 6 + V& 2 — 4ac 


2a 


thus 


p) __ b 2 p __ b 2 p 2 
n n 

O ') 2 = o 

ax 2 + bx + c = 0 is 


(fe 2 / 7i) T* 2 p f + 

= VC&Vn 2 ) + (4 Wp'/n) + 4(pQ 2 - 46»(p') «/w - 4(pQ» 

(2b 2 /n) + 2 

which simplifies to 


(2np') + b 2 ±b Vb 2 + 4np' - 4«(p') 2 

2 fc2 + 2n —— 


EXERCISES 

9.7.1. A sample of 49 from a population with unknown distribution has a 
mean of 50 and a variance of 9. 

a. Test the hypothesis that /x = 52. 

b. Find the .05 confidence interval for /x. 

9.7.2. In an experiment on reaction time, 81 subjects gave a mean reaction 
time of 354 msec, with a variance of 400 msec. Find the .05 confidence inter¬ 
val for /x for the population from which these 81 reaction times can be con¬ 
sidered a random sample. 

9.7.3. In Exercise 9.7.2, what would be the difficulty with this procedure 
if all 81 reaction times had come from the same subject? 

9.7.4. Nine hundred IQ scores are pulled at random from files containing 
the scores of 1,000,000 school children. The sample mean and variance are 
101 and 9 respectively. 
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a. Test the hypothesis that ju < 100. 

b. Find a .01 confidence interval for }i with lower bound only. 

9.7.5. Eighteen patients are selected at random from a large hospital. 
Each one is tested on two tests which are supposed to measure neurotic ten¬ 
dencies. The 36 scores have a mean of 105 and a variance of 25. The investi¬ 
gator wanted to test the hypothesis that the mean score in the hospital is 100. 

He took the critical ratio —-— = 6, and rejected the hypothesis at the 

% 

.001 level of confidence. What is wrong with this test? 

9.7.6. Four entirely unrelated hypotheses £re tested with four different 
samples. The following critical ratios were obtained: .90, 1.05, 1.35, .82. 
How can you test all four of these hypotheses simultaneously, using these 
critical ratios? 

9.7.7. A sample of 100 Democrats yields 65 who are in favor of a certain 
policy. 

a. Test the hypothesis that there are just as many Democrats opposed to 
the policy as there are in favor of it. 

b. Find the .01 confidence interval for the proportion of Democrats in favor 
of the policy. 

9 . 7 . 8 . A sample of 225 plants from a forest yields 70 which are infected. 

a. Test the hypothesis that 40 per cent of the plants are infected. 

b. Find the .03 confidence interval for the percentage of infected plants. 

9.7.9. How large a sample must be taken to ensure, at the .01 level of con¬ 
fidence, that the proportion of life-insurance policies for amounts greater 
than $10,000 has been estimated from the sample to within .02 of the true 
proportion? 

9.8. Difference between Two Independent Normally Dis¬ 
tributed Variates. We state without proof 1 the following 
theorem. 

Theorem 9.8.1. Let X be distributed according to 
and Y be distributed according to 00^,0**). Then X — Y is dis¬ 
tributed according to <t>(n x ~~ Hyj + <rl) if X and Y are chosen 
independently. 

The reader should be able to prove for himself that 

x - y - (Ms ~~ My) 

V<rl + <r$ 

is distributed according to 0(0,1) and to see the way in which a 
hypothesis about the value of the difference fx x - ix y can be 
tested with a member from each population. Also, he should be 

1 A proof is given in the Appendix, Sec. B f 21, 
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able to show that 


prob (x y ±b V<rl + <j\ includes y x - y y ) = 0(0,1) dx 

Definition. The standard deviation of the sampling distribu¬ 
tion of the difference between two variates is called the “stand¬ 
ard error of the difference,” abbreviated As we have just 

seen, if X and Y are independent, <ri_ y = + al. 

Notice that X and- Y can be any normally distributed variates 
whatsoever m Theorem 9.8.1. In particular they may be means 
of samples of size n x and n v respectively drawn from two nor¬ 
mally distributed populations (see Theorem 9.5.2). Thus we 
have as a corollary that if m x and m y are the means of samples 
drawn from N(n x ,<rl ) and N{y v ,<xl) respectively, then 

Km, - m.) - * („, - g + 0 

from which we have 


and 


/ 


m x - m y - (n x - n u y 

vTo-J/»*r+ iK/ny) _ 


= 0(0,1) 


prob [m x m y ± b V(<xl/n x ) + (<r*/n„) includes y x - n y ] 

= fl b ^>( 0 , 1 ) dx 

Thus we can test hypotheses about the value of y x — /jl v and 
also find confidence intervals, if we know that the populations 
being sampled are normal and we know their variances. 

We also state the following theorem without proof. 

Theorem 9.8.2. Consider samples drawn independently and 
at random from two populations, each with finite mean and 
variance. The sampling distribution of the difference between 

the means m x - m y will approach N (y x - Mi/ , —‘ + as Ux 

and n y approach infinity; more precisely, 

„lim_ prob (« < m, - m, < h) - f‘ 4 . ( M „ - „„ g + £r) ix 


JL 
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Corollary 


lim prob 

nx,Tty—* 00 


a < 


(m x — my) — (fix — Pv) 



< b 


b 

0(0,1) dx 


Corollary 

lim prob 

nx, 7 ly—+ <*> 


(m, 


■* “ m «) ± h + n y includes 


(px My) 


-r. 


0(0,1) dx 


With large samples we can use the sample variances as esti¬ 
mates of the population variances and thus test hypotheses about 
the value of p, x - p v , and also find confidence intervals. 

An important special case of Theorem 9.8.2 arises when each 
of the two populations is dichotomous. By transforming the 
values into 0 and 1, as discussed in Sec. 9.7, we obtain 


lim prob (a < p' x — p'„ < b) 


nxinyr* 


f b , ( Vx<ix , p v q v \ 

-j. * V- - p ” “ST + w) dx 

in which p' x and p' y are sample proportions. Also, 


lim prob 

nx,n y -+ « 


a < 


p'x ~~ Pv (Ps < 5 

j Pxqx , Pygy 
\ n x ^ 


f 


0(0,1) dx 


This corollary implies that with large samples we can test a 
hypothesis about the size of p x - p„, using p x and p’ y as esti¬ 
mates of the sizes of p x and p v respectively. In most cases our 
hypothesis will be that p x — p v = 0, that is, that p x = p y , in 

which case we can use v — as our estimate of p x (and p v ). 

*vx "T~ My 
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P rob U -v'.±b iMlud “ V. - p) 

= J b _ b 0(0,1) dx 

which enables us to find confidence intervals, using «' and v' as 
estimates of Px and Py respectively. Vv 


EXERCISES 

9 . 8 .1. Suppose w e plan to draw a member X at random from a population 
havmg the distribution *(100, 100) and a member Y at random from a pomi 

N(m 225)- thepr0babiIit ^ ^t, wX 

b. at least 10 greater than y 

9.8.2. Suppose we plan to draw a random sample of 9 from *(100 25) and 
a random sample of 16 from *(95,36). Call the mean of the first sample m 

“ mple Wtat “ th ' «»* -»• -■» b.* 

b. at least 2 greater than m y ? 

and 8 *f, ° f ! an f 25 Zt dl ' aWn at random from Populations *(^,16) 

d N{p y , 36) respectively. The sample means are m x = 10 and m = 20 
and the variances are 25 and 25 respectively. V 1 

a. Test the hypothesis that p x = p y . 

b. Find the .05 confidence interval for p x — p y 

9.8.4 Random samples of 100 and 225 are drawn from two populations 
with unknown djtributo. The sample means and variances areT= loS; 

a • Test the hypothesis that fi x = p y . 
bp Test the hypothesis that /i x > p y . 

c. Find the .01 confidence interval for p x 

d. Test the hypothesis that p x — 105. 
e * Find the .01 confidence interval for p x . 

/. Find the .01 confidence interval for jjL y 

tionf 6 T S r PleS 1- 100 "“I 200 ^ draW “ fr0m two dichotomous popula- 
' Th e respective sample proportions are p' x = .36 and p' = .45. 
dp Test the hypothesis that p x — p y = 0. 
bp Test the hypothesis that (p y — p x ) > .15. 

c. Find the .05 confidence interval for p x — p y , 

d. Find the .05 confidence interval for p x . 

T) 9 ; 8 ; 6 ; Samples of 100 and 225 are draw , n from two dichotomous populations. 

The respective sample proportions are p x = .70 and p y = .65. 


/V 
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a. Test the hypothesis that p x - Vy = °* 
b Test the hypothesis that (p v - P*) ^ - 02 - 

absolute deviations of 225 products produced by machine A yield a mean o 
00412678 and variance of .00000036. The absolute deviations of 225 pro - 
ucts^produc^d by machine B yield a mean of .00453860 and variance of 

TdoI' the manufacturer have good evidence on which to make a choice? 
b. Within what limits can the difference in mean precision between A and B 

be said to lie at the .01 level of confidence? 

. what assumptions are made in regarding these samples as random. 

9^8.8. Each of 800 persons seated in a large auditorium is given a simp e 
, -i 4-paf The papers are collected and a mimeographed 

ST&ISSS; rsh^eaeh person. Two differs., 

A and B, are handed out, e.eh person receiving °' 0 * ‘ h ‘"m ptople 

wraoriin^+lip article each person takes another attitude test. Of the 40U p p 
reading A, 80 have “higher” attitude srmres on the second test; of those rea - 

^tXXS7.nd * ■»•«— in “ Wne “ titude 

Tn»d the .05 confidence interval for the difference between A and B in 

|,r ” WUt assumptions are made in regarding these “ Ilpl " “ 

9.8.9. A vaccine is tested on 400 volunteers and found to be 80 per 

^ Knd the .01 confidence interval fo, its effective™* in the general 

TwiTt assumption is involved in regarding the volunteer, as a random 

TsS. Two different agricultural methcsl. are compared :for 8-owi.w I> 

rmSd B E * Th1 ml P vl fo.“t. with variance o, 4; the mean 

y r;iyeC^"» 0 « “ “ -* »' 

ran & d KndTe in 05 confidence interval for the difference between the mean 

yi6 c ld What assumption is involved in regarding the yields as random samples? 

9 8 11 A physiological index of emotionality is taken under each of tw 
experimental 3 conditions, A and B, on each of 100 subjects. The mean and 
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The 


variance of the indices under condition A are 4.50 and 1.44 respectively 

“ an T,::r ce of ? e indices under condition b are 5 - 32 and ^ *****- 

tiveiy. The mean and variance of the 100 differences between indices (one 
difference for each subject) are 0.82 and 1.21 respectively. What is the cor- 

z\Ti h r th r that the indices under the tw ° —^ 

test based? 1 T. Samplin 8 ? Upon what theorem is the 

the e+h +' + i? fi + ne t j e P°P uIatlon sampled m making this test. Why is 
the other test that might be made with these data not valid? 

9.9. Fitting a Normal Distribution to a Sample. In Sec 5 6 

we discussed fitting a binomial distribution and in Sec. 6.4 fitting 
a Poisson distribution. The principle of fitting a normal dis¬ 
tribution is the same, that is, we construct a sample of the same 
size, having certain characteristics in common with our actual 
sample but having the desired kind of distribution, in this case a 
norma 1 distribution (i.e., the discrete counterpart of a normal 
distribution). 

The characteristics which our constructed sample will have in 
common with our actual sample are simply the same size, mean 
an variance First of all we must have our data grouped into 
c ass intervals (see Sec. 4.8), for reasons which will be obvious 
once the complete process has been described. Next after 
calculating the mean and standard deviation of the sample we 
find, with the aid of the tables for N( 0,1), the proportion of 
cases we should have for each class interval to obtain the dis¬ 
crete counterpart of a normal distribution. This process can 
best be explained by an example. Assume that a large number 
of observations have been made correct to the nearest tenth of a 
unit, and that the first two columns of the table on page 122 
represent the result of grouping the data into class intervals. 
The interval limits are the “true” limits; for example, the 
observation 7.4 is assumed to include all values from 7.35 to 7 45 
ihus the highest interval is assumed to include all true values 
rom 7.25 to oo. In the calculation of the mean m and the 
standard deviation s, all observations are treated as though thev 
fell at the mid-points of the class intervals. The theoretical 
proportions are found by taking the number of standard devia¬ 
tions each class limit is from the sample mean. Calling the class 
imit x, this amounts to the transformation y = (x — m)/s. If 



122 basic statistical concepts 

the original observations have the distribution AT(m,s 2 ), then the 
transformed values have the distribution N(0,1); therefore by 
using the tables for N( 0,1) we can find what proportion of a 
normal distribution should lie within each of the class intervals. 
This computation has been carried out for the sample shown m 
the table, which has m = 4.945 and s = 1.081. For example, 


Interval 

Observed proportion 

Theoretical proportion 

7.25-co 

.01 

.01 

6.75-7.25 

.03 

.03 

6.25-6.75 

.08 

.07 

5.75-6.25 

.11 

.12 

5.25-5.75 

.16 

.16 

4.75-5.25 

.21 

.18 

4.25-4.75 

.14 

.14 

3.75-4.25 

.12 

.13 

3.25-3.75 

.07 

.09 

2.75-3.25 

.04 

.04 

2.25-2.75 

.03 

.02 

— oo-2.25 

.00 

.01 


the interval 6.25-6.75 becomes transformed into the interval 
(6.25 - 4.945)/ 1.081 to (6.75 - 4 . 945 )/ 1 . 081 , which is the inter¬ 
val 1 207-1.670. From the tables for AT (0,1) we find that the 
theoretical proportion for this interval is .07. As an exercise, the 
student should check this fit. 

As in the case of the binomial or the Poisson, we may wish to 
raise the question of how “good” a fit this represents, but we 
must defer this question until the next chapter. 




Chapter 10 
CHI SQUARE 


10.1. Definition of Chi Square. Consider a random sample 
of size n from a population having the distribution iV(0,l), that 
is, normally distributed with zero mean and unit variance. The 
sum of squares of values of members of the sample is called 
chi square with n degrees of freedom. 


n 


Definition. x« = ^ xf, with each x { independently distrib- 

» = i 

uted according to 0(0,1). 

We state without proof 1 that the distribution of xl is given by 


/(Xn) 


(j£2)(n/2)-l e -xV2 

“ 2^ 2 >r(n/2) 


in which r(»/2) = [(n/2) - 1]! if n is even and [(»/ 2) - 1] 

[(n/2) - 2] • • • y 2 v/jr if n is odd. This function (T) is 
called the gamma function. 

The sum of two independent x 2 ’s is itself distributed according 
to x 2 ) since 




xi + xi = £ x$ + y xf = y x? = xi 

i = l i = 1 

and therefore 2 /(xi + xi) = /(xi+J- 


l-f-n2 


1 A proof is given in the Appendix, Sec. B.22. 

2 This argument is not a proof, as it assumes that because the two x 2, s are inde- 
pendent the individual *<’s are also independent. A proof is given in the Appendix 
pec, Jd,22, 
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Frequency functions of xl for ti = l, 2, 6, 15 are illustrated in 
Fig. 10.1.1. 

We state without proof 1 that /z[— -®(x»)] = n an< f 

c 2 { = E[( X l ~ «) 2 ]} = 2n 

Further, the mode is n - 2, except for n = 1. 



10.2. Goodness of Fit When the Hypothetical Distribution Is 
Completely Specified. Suppose we have a random sample of 
size n from a population with unknown distribution, and we wish 
to test the hypothesis that the population is distributed in a 
certain way, which we specify completely, that is, we not only 
hypothesize the functional form of the distribution (normal, 
binomial, Poisson, etc.) but we hypothesize also the values of all 
parameters. In order to test our hypothesis we need merely to 
compute, on the basis of our hypothesis, the expected frequencies 
(theoretical frequencies) for each of the possible values (or value 
categories, for example, class intervals) in the population and to 
compare these expected or theoretical frequencies with those 

actually obtained in the sample. 

Let our sample values be tabulated into k groups, and let the 
frequencies in these groups be f 0 1 , f 0 2 , • • • > fok (fo meaning 
observed frequency). On the basis of our hypothesis we then 

calculate the corresponding theoretical frequencies/u,/« 2 , • • • > 

1 Derivations of the mean and variance are given in the Appendix, Sec. B.22. 
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ftk- We then form the statistic Y /»)_ . \y e gtate w ith- 

hx 

out proof the following theorem, first proved by K. Pearson, 1 

Theorem 10.2,1 


lim F 

as n—> oo 




in which the symbols have the meanings 2 described above. In 
other words, as our sample size increases, the distribution of 

y Uoi - fuV 
1 


becomes a closer and closer approximation to the 
distribution of xl— 1 > regardless of the distribution of the original, 


ft- 


population. For this reason the statistic Y ^ ^ w m be 

i = l Jti 

called % , trusting that the context will make clear whether we 
are speaking of this statistic or of x 2 as defined in the previous 
section. 

When all the theoretical frequencies f ti are at least 5 and 
k > 5, the approximation is sufficiently close for ordinary pur¬ 
poses. If k < 5 the f H should be 10 at least. Whenever some of 
the fa are too small, it is advisable to pool the smaller groups. 

The hypothesis that our sample was drawn from a population 
with the distribution which we have specified can be tested in a 
manner analogous to that in which we have previously tested 
hypotheses. Note that, if our theoretical frequencies fit our 
sample exactly, x 2 equals 0. The further our sample departs 
from expectation, the larger the value of x 2 we obtain. We find 
a critical value C such that if the hypothesis is correct the prob¬ 
ability (before the sample is drawn) that x 2 will exceed C is only 
P(-05, *02, etc.). The critical value C for a given p is found by 

1 Proofs are given by Cramer (3) and Mood (9). 


k 

2 This theorem means lim prob ■ —-yy - < cj — f° /( x | 2 ) d x 2 

i=l 
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solving the following equation: 

1 - p = / 0 C /(x2-i) dx 2 

In other words, a value C is found such that p of the probability 
will lie in the interval C to oo. Values of C have been tabulated 
in Appendix D for various values of 1 — p and for n (degrees of 
freedom) < 30. If the number of degrees of freedom is greater 
than 30, the following theorem can be used: 

Theorem 10.2.2. V2 xl is approximately normally distrib¬ 
uted with mean V2n — 1 and un it variance. 

Corollary. V‘2xZ - V2 n - 1 has approximately the dis¬ 
tribution iV (0,1). 


Examples. 1 . On each trial of a learning sequence, a monkey chooses one 
of two perceptually different objects. One is correct (filled with food); the 
other is incorrect. Of 500 trials, 300 are correct and 200 are incorrect. On 
the hypothesis that the monkey is responding at random on each trial we have 
theoretical frequencies of 250 and 250. Therefore, we have 


X' 


V Uoi-hY 2 (50) 2 
l h ~ 250 

i=l 


20 


Referring to the Appendix, we see that for n — 1 the probability of obtaining 
a value of x 2 this large or larger is less than .005. Therefore we reject our 
hypothesis at the .005 level of confidence. 

Notice that we could have tested our hypothesis in this case also by the 
critical ratio, for 


C.R. 


- + V20 

\/pq/n vHooo 


It is not a coincidence that the C.R. is a square root of x 2 f° r one degree of 
freedom. If our hypothesis were correct, the C.R. would have the distribu¬ 
tion N( 0,1). Therefore, the square of the C.R. would, by definition, be X g 2 
with one degree of freedom. This means that we could have computed Xi> 
on the basis of our hypothesis, by computing the C.R. and squaring.^ 

2* Suppose we have the hypothesis that variations A, R, C, and D in a cer¬ 
tain kind of plant will be distributed in the ratio 6:4:3:1. Of 200 plants 
observed, the respective frequencies are 90, 54, 44, 12. Each theoretical fre¬ 
quency is 200 pi, where pi is the probability, on the basis of our hypothesis, 
that a plant selected at random will be of variety L We then have the accom¬ 
panying table; 
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Variety 

fo 

ft 

(/.-/,)* 

ft 

A 

90 

85.7 

.216 

B 

54 

57.1 

.168 

C 

44 

42.9 

.028 

D 

12 

14.3 

.370 




.782 


Since the probability of obtaining a xt this large or larger is >.80, we must 
consider the agreement good. 

If a second independent set of observations yields a xt of 6.015, we can test 
both sets of observations together by taking the sum xt + xt — 6.797 = X6- 
We observe from our table that the probability of a xt of this size or larger is 
>.30. Thus the agreement is still acceptable. We could also have thrown 
our observations together and obtained a xt based on a larger sample. 

3. Suppose we have a sample of 670 observations whose distribution is given 
by the following table: 


Value 

Frequency 

Value 

Frequency 

26 

1 

4 

32 

25 

2 

3 

27 

24 

2 

2 

30 

23 

4 

1 

30 

22 

5 

0 

21 

21 

6 

-1 

18 

20 

5 

-2 

23 

19 

8 

-3 

17 

18 

10 

-4 

11 

17 

9 

-5 

13 

16 

12 

-6 

11 

15 

17 

-7 

15 

14 

15 

-8 

9 

13 

25 

-9 

8 

12 

22 

-10 

9 

11 

26 

-11 

5 

10 

31 

-12 

7 

9 

30 

-13 

3 

8 

25 

-14 

0 

7 

38 

-15 

4 

6 

33 

-16 

1 

5 

40 
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Suppose we wish to test the hypothesis that our sample was drawn randomly 
from a population having the distribution JV(5,49). To do this we calculate 
fu in exactly the same way as described in Sec. 9.9, except that we use 5 as 
the mean and 49 as the variance in computing fu instead of the sample mean 
and variance. We then obtain the following table: 


Interval 

fo 

ft 

21.5-cc 

14 

6.10 

18.5-21.5 

19 

11.86 

15.5-18.5 

31 

26.80 

12.5-15.5 

57 

50.59 

9.5-12.5 

79 

79.60 

6.5-9.5 

93 

104.32 

3.5-6.5 

105 

111.49 

0.5-3.5 

87 

104.32 

-2.5-0.5 

62 

79.60 

-5.5—2.5 

41 

50.59 

to 

1 

>4 

QO 

i 

35 

26.80 

-11.5—8.5 

22 

11.86 

- cc—11.5 

15 

6.10 


X 12 calculated on the basis of the table is about 51, which allows us to reject 
our hypothesis at the .005 level of confidence. If the extreme theoretical fre¬ 
quencies had been less than 5, we should have combined the two extreme class 
intervals at each end and calculated xio* In this case, calculation of xio 
yields about 48, which still allows us to reject our hypothesis at the .005 level 
of confidence. 

We can also make the x 2 test by considering that if our sample were drawn 
from a population having the distribution W(5,49) and if we transform each 
value x in our sample into y by the transformation y — (x 5)/7, we should 
have a sample drawn from a population having the distribution iV(0,l). The 
sum of the squares of the y’ s therefore should be distributed as Xe 7 o* As n is 
too large for our tables, we have to use the theorem that V2x« — \/2n — 1 
ha s the distribution W(0,1). This test actually yields x< 37 o = 876, and 
^ 2x670 — \/2(670) — 1 = 5.3, a value which allows us to reject our hypothe¬ 
sis at the .001 level of confidence. Whereas our previous test is approximate, 

670 

this test is exact, because by hypothesis F a yf) = ^(Xe 7 o); furthermore, 

\ = i 

our latter test would be applicable even if the sample were small. 
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EXERCISES 


• lest the hypothesis that the following sample of IQ’ 
random from a population having the distribution N(100, 100): 


j-h s is orawn at 


x (mid-point of interval) 


Frequency 


130 

125 

120 

115' 

110 

105 

100 

95 

90 

85 

80 

75 

70 


2 

5 

7 

10 

20 

40 

60 

35 

30 

20 

8 
4 
1 


' oiiowing s * m,>,s is dra ™ •* 

f °"° wins “ mpie “ i '"" “ r “ d ° m 

10.2.4. According to theory, the progeny of a certain species are distributed 
among the three categories A, B, and (7 in the ratio 5:3:2. Are the fZSg 
100 observations in accord with this theory? “ 


Category 

Frequency 

A 

50 

B 

40 

C 

10 


from' 2 ' 6 ' Te f , the h yp° thesis that the following sample is drawn randomly 
IhSix ^O l VS.tT ' lbUti0n iS given by f(x > = + 2.r + 1)/91 


X 

Frequency 

5 

88 

4 

40 

3 

25 

2 

15 

1 

10 

0 

4 
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10.2.6. Test the hypothesis that the following cntiealratioswere all 
obtained by testing hypotheses that are correct: .20, 1.20, .bl, .w, 

10 2.7. Test the hypothesis that the following sample was drawn random y 
from a population having a Poisson distribution with a mean oi 2: 

x I Frequency 


10 . 2 . 8 . A machine is constructed which, it is asserted, turns out 
random.” The following data are obtained by taking 100 runs o g 

each and counting the number of 4’s in each run: 

x ] Frequency 


Are these data consistent with the assertion that digits are turned out “at 
random”? What are some other tests that could be made. 

10 2 9 . A product is marketed in two packages which are identical except 
for being of different colors, red and pink. The packages are mixed tog^ther 
thoroughly in a display and each customer serves himself. Aftei 100 pack 
ages have^been sold (this number having been determined in advance) it is 
found that 60 red and 40 pink packages have been sold. 

a. Is this result consistent with the hypothesis that color makes no diffei- 

b What does the assumption of random sampling mean in this case? 

10 2 10. Flip a coin as “randomly” as you can 100 times; record the number 
of heads and tails and test the hypothesis that your flips are unbiased. 

10.2.11. Toss a die 100 times and test your tosses for bias. 

10.3. Goodness of Fit When the Hypothetical Distribution Is 
Incompletely Specified. In many, if not most, applications we 
wish to test the hypothesis that our sample is drawn from a 
population having a certain form (that is, normal, binomial, 
Poisson, etc.), but we do not wish to include m our hypothesis 
specific parameter values. In order to compute theoretical fre- 
nuencies for this case, we must estimate parameter values from 
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our sample. In so doing, however, we obviously tend to make 


our sum 


i 

i — 1 


(foi - fa y 


u 


smaller than it would be if we knew or 


hypothesized the parameter values. To compensate for this 
decrease in the value of x 2 we must therefore decrease the 
number of degrees of freedom whereby we interpret x 2 * - A 
remarkable theorem due to R. A. Fisher states that if we estimate 
parameters in such a way as to make x 2 as small as possible, it is 
necessary only to reduce the number of degrees of freedom by 
one for each parameter estimated from the sample. Suppose 
that we estimate r parameters from our sample; then, with the 
same meanings for the symbols as in Theorem 10.2.1, we have 
the following theorem. 


Theorem 10.3 


lim &4I 

aa n—> « |_ J ti 




= F(x l-r-l) 


Note that Theorem 10.2.1 is a special case of Theorem 10.3, 
that is, the case in which we e sti mate no parameters from the 
sample (r =0). 

We state without proof 1 that for a given sample the estimates 
of the mean and variance of a normal population which minimize 
X 2 are the sample mean and variance. The estimate of the mean 
of a Poisson population which minimizes x 2 is the sample mean. 1 

Examples. 1. Suppose that we have the sample given in Example 3, 
Sec. 10.2, and we wish to test the hypothesis that our sample was drawn from 
a normal population. We compute the theoretical frequencies in exactly the 
same^waJTasFefore, except that we now use the sample mean and variance as 
our estimates of ju and <r 2 . The student can verify that x 2 is about 7. This 
X 2 is to he Jnterpreted on the basis of 13 — 2 — 1 = 10 degrees of freedom 
rather than 13 — 1 = 12 as before, since we estimated two parameters in 
making our fit. We find that we cannot reject the hypothesis that our sam¬ 
ple is drawn from a normal population, although we can reject the hypothesis 
that it was drawn from W(5,49). 

2 . Suppose we have a sample of observations and we wish to test the hypoth¬ 
esis that our sample is randomly drawn from a population having a Poisson 
distribution. We need merely divide all possible values of a Poisson distribu- 

1 A proof is given by Cramer (3). 
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tion (0, 1, 2, . . .) into Jc groups and calculate the theoretical frequency for 
each group by using the Poisson expression/^) = using the sample 

mean as an estimate of ju. As we estimate only one parameter, we interpret 
this x 2 according to k — 2 degrees of freedom. 


EXERCISES 

10.3.1. Test the goodness of fit of the example in Sec. 9.9, assuming the 
sample has 

a. 1,000 members 

b. 10,000 members 

10.3.2. Test the hypothesis that the following sample of laboratory grades 
can be considered drawn randomly from a normal population: 


x (mid-point) 

Frequency 

60 

3 

59 

10 

58 

15 

57 

21 

56 

38 

55 

40 

54 

30 

53 

25 

52 

21 

51 

12 

50 

5 


10.3.3. Are the following sample data on the incidence of incoming tele¬ 
phone calls during a 5-min period on each of 110 days (for one telephone) 
consistent with the hypothesis that the calls are individually and collectively 
at random? 


Number of calls 

Frequency 

0 

2 

1 

5 

2 

15 

3 

25 

4 

30 

5 

23 

6 

10 


10.3.4. Test the goodness of fit in Exercise 6.4.2. 
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10 . 3 . 5 . It is often assumed that errors of measurement are distributed 
normally. Is the following random sample of errors made in operating track¬ 
ing instruments consistent with this assumption? 


Error 


Frequency 


5 

1 

4 

4 

3 

8 

2 

15 

1 

31 

0 

45 

-1 

22 

-2 

12 

-3 

10 

-4 

7 

-5 

5 


/M- 










10 . 3 . 6 . Over a period of 10 years a random sample of 135 students winning 
competitive scholarships for undergraduate work is tabulated by dormitories. 
The number of students living in each dormitory during that time is also 
shown. 


Dormitory 

Number of students living 
in dormitory 

--—---:-ii~ 

Number winning scholarships 

A 

2,500 

30 

B 

1,250 

15 

C 

2,000 

29 

D 

1,050 

22 

E 

2,200 

14 

F 

2,800 

35 


Are these data consistent with the hypothesis that the dormitory a student 
lives in has had no relation to whether he wins a scholarship? Define care¬ 
fully the population sampled and state the assumptions involved in making 
this test. 


10.4. Test of Independence in a Contingency Table. Suppose 
we are sampling a population each of whose members we classify 
in two ways instead of only one. A table showing the two-way 
classification of a sample from such a population is called a con¬ 
tingency table . Let the contingency table shown as Table 10.4.1 
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represent the distribution of a sample of 200 randomly drawn 
from a population in which each member has one of the values 
X\ or x 2 and also one of the values y i or y 2 . Now suppose that 
we are interested in the following problem. In the population 

Table 10 . 4 . 1 . Contingency Table Showing Distbibution of Sample 
of 200 Drawn from Population in Which Each Member Has 

Two Values 



Xi 

X 2 

X\ + X 2 

yi 

20 

100 

120 

2/2 

40 

40 

80 

2 /i + 2/2 

60 

140 

200 


sampled, are the two values independent of each other in the 
sense that, of those members having the value Xi, the distribu¬ 
tion of members with respect to y is the same as the distribution 
with respect to y of those members having the value x 2 (and 
therefore the distribution with respect to y of the entire popula¬ 
tion)? For example, if in the population/(xi) were .56 [so that 
f(x») would be .44] and/(y0 were .25 [so that f(y a ) would be .75], 
then would .56 be divided between yi and y 2 in the ratio of 1 to 

Table 10 . 4 . 2 . Theoketical Peopoetions on Hypothesis That 
f ( x i) = . 56 , f ( yi ) = . 25 , and f { x , y ) = f ( x ) f ( y ) 



Xi 

x 2 

Xi + x 2 

yi 

.14 

.11 

.25 

2/2 

.42 

.33 

.75 

2 /i + 2/2 

.56 

.44 

1.00 


3, as in the entire population? If this were the case, the popula¬ 
tion would have the distribution shown in Table 10.4.2. This 
table was obtained by dividing .56 and .44 each into two parts 
having the ratio Note that 

f(x h Vl ) = .14 = (.56) (.25) =/(x 1 )/(y 1 ) 
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similarly for the other entries in the table. As a matter of fact, 
a general condition for independence is simply/(x, y) = f(x)f(y ). 
(Compare with the discussion in Sec. 2.3.) 

It is possible to make a x 2 test of the hypothesis that/(xj) and 
f(y i) have certain specified values and that x and y are independ¬ 
ent by computing theoretical proportions (probabilities) as indi¬ 
cated above and multiplying each proportion by the size of the 
sample to obtain theoretical frequencies. A x 2 computed in this 
way would have three degrees of freedom. In most situations, 
however, we wish to test the hypothesis of independence without 
including in our hypothesis a specification of the values of f(x i) 
and f(yi). By using the marginal totals in our sample divided 

Table 10.4.3. Theoretical Frequencies Computed from 
Maeginal Totals of Sample 



Zl 

Z 2 

Xi + x 2 

yi 

36 

84 

120 

2/2 

24 

56 

80 

2/i + 2/2 

60 

140 

200 


by the size of the sample as estimates of the parameters f(x i) 
and /(j/i), we can apply Theorem 10.3 and thus make a % 2 test 
with one degree of freedom. Following this procedure with our 
sample of 200 we obtain the theoretical frequencies shown in 
Table 10.4.3. We then have 

v * _ V (fa - fuY nr tn 
Xi = A ?- = 25.40 

i -i J H 

By our table we see that a x? of this size is significant at the 
.005 level of confidence. 

Each classification may be qualitative or quantitative 
or ordinal and may involve more than two values. In gen¬ 
eral, we have the sample shown in Table 10.4.4. The esti¬ 
mated independent marginal proportions (probabilities) are 

^ (i = 1, 2, . . . , r - 1) and ^ (i = 1, 2, . . . , s - 1). 




36 


BASIC STATISTICAL CONCEPTS 


Table 10.4.4. Sample Contingency Table with rs Cells 



Xi 

X 2 r * * 

X s 

L 

3 = 1 

y i 

nn 

ni2 


n u 

n v 

2/2 

n 2 i 

n 2 2 


n 2s 

n 2 - 





• 


Vr 

n r i 

n r2 


firs 

Ur- 

r 

l 

n. i 

n-2 


n s 

n 

i = 1 







The theoretical probability for cell ij is simply 
theoretical frequency for cell ij is therefore 



The 


n 



nj.n.j 

n 


A s (r — 1) + (s — 1) = r + s — 2, independent parameters 
have been estimated, and as there are rs cells in all, by Theorem 
10.3 the number of degrees of freedom is 

rs - (r + s - 2) - 1 = rs - r - s + 1 = (r - l)(s - 1) 

Although a large value of x 2 relative to the number of degrees 
of freedom provides strong evidence that the two variables are 
related, it does not indicate the degree of relationship. With a 
very large n, a contingency table may give a very large x 2 even 
with only a slight difference in the distributions of the rows (or 
columns). (The student should be able to prove this as an 
exercise.) A quantitative estimate of the degree of relationship 
is x 2 /n{q - 1) in which q is the smaller of the numbers r and s. 
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We state without proof that this measure of relationship can 
have values only between 0 and 1. 

EXERCISES 

110 . 4 . 1 . A sample of 150 is drawn at random from all alumni of a large uni¬ 
versity. Each alumnus is classified as either professional or nonprofessional 
and also as either satisfied or not satisfied with his work. The following con¬ 
tingency table is obtained: 


Satis. 


Not satis. 


Prof. Nonprof. 


50 

40 

30 

30 


Test the hypothesis that 

\ a. being professional has no relation to being satisfied. 

b. there are as many nonprofessional alumni as professional, and as many 
satisfied as not satisfied alumni, and the two variables have no relation to each 
other. 

c. among professional alumni there are three times as many who are satis¬ 
fied as there are who are not satisfied, whereas among nonprofessional alumni 
there are as many not satisfied as there are who are satisfied. 

10 . 4 . 2 . In an experiment using the Rorschach ink blot test, each patient is 
classified as to whether his profile on the test gives a favorable prognosis for 
therapy and also as to whether he has been judged (by clinicians other than 
those using the test) to have a good chance of recovery. The obtained con¬ 
tingency table is shown: 


Good 

chance 


Not good 
chance 


a. Test the hypothesis that the prognosis made from the Rorschach is unie- 
lated to that made independently. 

b . Estimate the degree of relationship. 

10 . 4 . 3 . In an experiment on the relation of types of figural after-effects to 
measures of rigidity in thinking, each of 200 subjects was tested on both vari¬ 
ables, and the following contingency table was obtained: 


Unfav. prog. Fav. prog. 
(Rorschach) (Rorschach) 


50 

75 

50 

25 
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Type of figural after-effect 


g 

'§) 

H 


A 

B 

C 

D 

14 

25 

14 

17 

20 

15 

21 

24 

6 

10 

15 

. 

19 


Test the hypothesis that rigidity is independent of type of figural after- 
effect 

10 . 4 . 4 . Derive the following computational formula for a 2 X 2 contingency 
table, in which theoretical frequencies are estimated from marginal totals 
and the obtained frequencies are a, b, c, and d, with b and d located diagonally 
with respect to each other: 

2 (a + b + c + d)(ad - be) 2 
(a + b)(c + d)(b + d)(a + c) 

10 . 4 . 5 . Eighty specimens of a certain species were examined and tabulated 
with respect to presence or absence of characteristic A and presence or absence 
of characteristic B. The following table was obtained: 


B 


B 


If approximately the same proportions were to hold in a larger sample, how 
large would the sample have to be to obtain a x 2 significant at the .01 level 
when the hypothesis of independence is tested? 

10.6. Tests of Homogeneity. Let us suppose that we have s 
different samples, each member of each sample having one of r 
different values, 2 / 1 , 2 / 2 , • • • ? 2/r* We can then assemble our 
data into a table which is formally identical with that of Sec. 
10.4. In this case, however, the marginal totals n.j (j = 1, 2, 
. . . , s) are determined in advance or, at any rate, have nothing 
to do with estimates of parameters. 

We wish to test the hypothesis that all our samples are drawn 
from populations having the same distribution. We estimate 
this distribution (the distribution of y ) from the marginal totals 


A A 


14 

21 

16 

29 
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n *\ (* = 1, 2, . . . , r), the last parameter estimated not 
being independent of the others. In other words, estimated 
fiVi) = rii./n. We then multiply the number in the sample to 
get each theoretical frequency, that is, the theoretical frequency 

Table 10.5.1 Frequency Distributions or s Samples 


Sample 



1 

2 

s 


2/i 

rin 

«12 


riis 

nv 

2/2 

n 2 1 

72-22 


ri2s 

n 2 • 

Vr 

n r i 

Ur2 


Tlrs 

n r . 

T 

l 

n.i 

n . 2 


n . 9 

n 

= 1 







for yi for sample j equals («<./«)%.,-. Our theoretical frequencies, 
therefore, are the same numerically as in a test for independence.’ 
As in the test for independence, 


X 2 = V (fa ~ fay 

i -:l 

We state without proof 1 that, as before in the test for independ¬ 
ence, the number of degrees of freedom is (r — l)(s — 1). This 
should seem reasonable, because if our theoretical frequencies 
were entirely hypothetical (rather than estimated) we could 
compute a x 2 with r — 1 degrees of freedom for each sample. 
The sum of. these x 2 ’s would be x 2 with s(r - 1 ) degrees of 
freedom. Since we estimated r — 1 independent parameters, we 
should then have, if Theorem 10.3 applied (actually a slight 
modification of this theorem is required), 

s(r — 1) — (r — 1) = (r — l)(s — 1) degrees of freedom 

1 A proof is indicated by Cram6r (3). 
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EXERCISES 


10 . 5 . 1 . A sample of 50 students has been drawn from each of five large 
secondary schools. Each member of each sample is given an intelligence test, 
and an IQ is obtained. The sample distributions are shown in the following 
table. 


School 



1 

2 

3 

4 

5 

110- 

12 

9 

4 

15 

10 

x 100-109 

16 

23 

18 

20 

15 

90-99 

12 

11 

15 

10 

16 

-89 

10 

7 

13 

5 

9 


Test the hypothesis that the samples are drawn from populations having 
the same distribution. 

10.5.2. In a large community a sample of 50 Democrats and a sample oi 
50 Republicans have been drawn at random. Each member of each sample 
is asked whether he is for, against, or indifferent to a certain policy. The fol- 
lowing table is obtained: 



Republican 

Democrat 

For. 

20 

10 

Against. 

15 

30 

Indifferent. 

15 

10 


Test the hypothesis that Democrats and Republicans in the community 
are homogeneous with respect to their potential replies to the question. 

10.5.3. Subjects are divided into two “ types ” on the basis of a personality 
inventory and then tested on the Rorschach, each subject obtaining scores on 
form, color, movement, whole, and detail. There are 50 subjects in all. T e 
following table is obtained, in which each entry is the sum of scores for subjects. 



Type A 

Type B 

Form . 

257 

201 

Color . 

31 

50 

Movement. 

108 

183 
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Why would it be incorrect to compute chi square for this table? Can the 
table be modified in any way so as to justify a chi-square test of independence? 

10 . 5 . 4 . In an experiment on effects of diet on behavior, one group of 30 rats 
was run under conditions of vitamin B deficiency; another group of 30 was 
run under conditions of an adequate diet. Each rat was tabulated as show¬ 
ing or not showing each of the following three kinds of behavior: retracing, 
perseverating, giving up. The following table shows the number of rats in 
each group showing each of these kinds of behavior: 



Deficiency 

Normal 

Retracing. 

15 

10 

Perseverating. 

20 

7 

Giving up. 

12 

3 

None of these. 

8 

16 


Should chi square be used to test for homogeneity? Explain your answer. 

10 . 5 . 5 . Analyze the data of Exercise 10.3.6, considering the students from 
each dormitory as a sample and testing for homogeneity. Why is the value 
of x 2 practically the same as that obtained previously, despite the much larger 
sample? 

10 . 5 . 6 . Inspection of a sample of 300 products yields 15 defectives. Test 
the hypothesis that the proportion of defectives being turned out is .04. 
Show the equivalence of the x 2 test and the test based on the central-limit 
theorem for this type of situation (that is, a two-cell table). 

10.6. Test for Variance of a Normal Population. We state 
without proof 1 the following theorem. 

Theorem 10.6. Consider a sample of size n drawn at random 
from a population having the distribution N(n,cr 2 ). Then 
ns 2 /a 2 is distributed as xLi, where s 2 is the sample variance. 

To test the hypothesis that a given sample is drawn from a 
normally distributed population with variance c, we compute 
ns 2 /c and use the tables for/(xiLi)- 

Further, a confidence interval (at the p level) from <r| (lower 
bound) to <rl (upper bound) can be obtained by the following 
relations: 

<r| = ns 2 /d, in which d is obtained by finding the value d for 
which ///(xLi) d% 2 = p/2. 

1 A proof is given by Hoel (8). 
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(rl = ns 2 /b, in which b is obtained by finding the value b for 

which / 0 V(xLi) d% 2 = v/ 2* 

The values of & and d are, of course, obtained from tables of 
/(%»- 1 )- As an exercise the student should deduce these rela¬ 
tions from Theorem 10.6. 


EXERCISES 

10 . 6 . 1 . A sample of 10 is drawn at random from a normally distributed 
population. The sum of squared deviations from the mean is 50. 

a. Test the hypothesis that the variance of the population is 5. 

b. Find a .05 confidence interval for the population variance. 

10 . 6 . 2 . A sample of 20 is drawn at random from a normally distributed 
population. The sum of the values is 60; the sum of squares of the values 
is 200. 

a. Test the hypothesis that the population variance is 3. 

b. Find a .01 confidence interval for the population variance. 


Chapter 11 

"STUDENT’S” t DISTRIBUTIONS 


11.1. Definition of t n and the Distribution of t n . In Chap. 9 
we found satisfactory ways of testing hypotheses about (and 
finding confidence intervals for) means of populations in the 
following cases: 

1. When the sample is large (“large” meaning greater than 
about 30). 

2. When the sample is large or small, but from a normal 
population with known variance. 

It would be highly desirable to find a satisfactory method for 
testing hypotheses about population means when the sample is 
small from a population with completely unknown distribution. 
Such a method has never been found and probably does not 
exist, however, a satisfactory method has been found for the 


S§se AG which thfi.sample is drawn from a normal population with 
unknown variance. 1 We cannot use the variance of a small 
sample as an estimate of the population variance, as in the case 
of large samples, because the standard error of the sample vari¬ 
ance is too large with small samples. We now describe a 
method which does not involve estimating the population vari¬ 
ance, first found by W. S. Gosset, writing under the name 
“Student,” and later proved rigorously by R. A. Fisher. 

Let X be distributed according to <^>(0,1) and Y be independ¬ 
ently distributed according to We define t n as 


In 


X 

VY/n 


We state without proof 1 that the distribution of t n is given by the 
following theorem. 


1 A proof is given in the Appendix, Sec. B.23. 
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Theorem 11.1 


f(tn) 



r C JVn vV 



n +1 
2 


— 00 < t < 00 


in which n is called “the number of degrees of freedom” and T 
means the gamma function as described in Sec. 10.1. Note that 



Since (1 + t 2 /n) {n+1)/2 has a minimum value for t = 0 ,f(t n ) has a 
maximum value for t = 0; that is, /(0) is the greatest density in 
the t n distribution. Further, as t 2 = ( — t) 2 , and as t appears in 
f(t n ) only as t 2 , the t distributions are symmetrical about 0, as 



Fig. 11.1.1. The density functions fits) and f(h), compared with 0(0,1). (Re¬ 
printed, from W. J. Dixon and F. J. Massey, Introduction to statistical analysis, 
McGraw-Hill, 1951, with permission of the authors and publishers.) 

illustrated in Fig. 11.1.1. This symmetry about 0 implies that 
for all values of n for which /t [that is, Eit n )] exists, m = 0. For 
n = 1, /x does not exist. 1 The symmetry about 0 also implies 
that all existing odd moments about the mean are 0. We state 
without proof 2 that the variance <r 2 is equal to n/(n — 2) for 
n > 3, and also that as n gets larger the distribution of t n 

1 This statement means that hf(ti) dh does not have a finite value, that is, 

that S hf(ti) Ah does not approach a limit as Ah approaches 0. 

2 A proof is given by Cramer (3). 
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approaches a normal distribution with 0 mean and unit variance, 
that is, lim f(t n ) = <£(0,1). 

as 7t—» oo 

Tables of the distribution of t n for various sizes of n are given 
in Appendix D. 


EXERCISES 

11.1.1. An experimenter has tested a certain hypothesis with two inde¬ 
pendent samples. For one sample, the experiment was designed so that xfo 
was appropriate, and the value obtained was 5. For the other sample, the 
critical ratio was the appropriate statistic, and the value obtained was 1.5. 
How can these two results be used simultaneously to test the hypothesis? 
Why does this test have a very low power, in fact so low that the experimenter 
should have considerable difficulty in explaining his results? 

11 . 1 . 2 . Using Theorems 10.6 and 11.1, find the distribution of - m ~ M . 

s/y/n — 1 

in which m and s are the sample mean and standard deviation andju is the 
population mean, assuming that m and s are independently distributed and 
that the population sampled is normally distributed. 

11.2. Testing Hypotheses about Population Means and Find¬ 
ing Confidence Intervals with Small Samples from Normal 
Populations with Unknown Variances. Consider a sample of 

> size n drawn from N(n,<r 2 ). As we saw in Sec. 9.4, 3 has 

<r/Vrc / 

the distribution A(0,1). We state without proof 1 tha(s 2 n/a 2 , in 
which s 2 is the sample variance, is independently distributed as 
Therefore 

(m — n) Vn /<r _ m — jx 
V(s 2 n)/<r 2 (n - 1) s/Vn - 1^ : * 

is distributed as f n _i. As a corollary we have 

prob (m + b includes ^ dt 

. 7 : 

We stated in Sec. 11.1 that the method which we would 
describe does not involve estimating the population variance. 

1 A proof is given by Mood (9). We had already stated in Theorem 10.6 that 

s 2 ra/ff 2 has the x 2 distribution, but not that it is independent of —- t - 

<r/Vn 
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On the other hand, it is often stated that t is the ratio of a 
normally distributed variate to an unbiased estimate of its 
sampling error. It is true that s 2 / (n — 1) is an unbiased 1 esti¬ 
mate of er 2 /n, the variance of rn — ju, and that therefore s 2 n/ 
in - 1) is an unbiased estimate of the population variance <r 2 . 
With small n, however, this estimate, though unbiased, is by no 
means a good estimate in the sense of tending to have a small 
sampling error, and in using t with small n we are not making 
the assumption that the denominator is a good estimate in this 
sense any more than we are assuming that with a small sample 
m is a good estimate (in this sense) of j a. With small n, both 
numerator and denominator of t tend to have large sampling 
errors, yet the distribution of the ratio is known just as exactly 
as with large n. 

In Sec. 9.8 we considered the problem of testing the hypothesis 
that the mean of two populations have the same value (or differ 
by a specified amount). We found a way of making such a test 
(and of finding confidence intervals) with large samples. With 
small samples, however, we must make use of Theorem 11.2, 
which is given below. 

The corollary to Theorem 9.8.1 states that if m x and m v are the 
means of samples drawn from N(fi x ,<rl) and A r (Mi/>0 respec¬ 
tively, then 

/ cr 2 ff 2 \ 

f(m x — m y ) = 4> (m* - Mv, n x + nj 

A special case is that in which <r 2 = <r 2 , so that we have 

/ cr 2 <r 2 \ 

f(m x - m v ) = <t> (^M* - Mv, ^ 

in which cr 2 is the variance of each population. In this case 

, r (Wi ~ THy) ~ (Ms ~ My) _ 1 /Q -|\ 

J L s/a 2 /n x + a^/uy J 

Also, {n x s 2 x + n y sl) /<r 2 is distributed independently as xL+»„- 2 - 
Therefore, by Theorem 11.1 we have the following: 

1 A statistic z is an unbiased estimate of a parameter { if and only if E(z) = $. 
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Theorem 11.2 


[(m x — m v ). — (p x — Hy)]/V(T 2 /n x + <r 2 /n y 
V(n x sl + n y sf)/<r 2 {n x + n y - 2) 

_ \{jn x Tfly') (.Px My)j U x I U y 2 

Vn x sl + n y sl Vl/n* + l/n y 

is distributed as t n , +ny _ 2 - 


Corollary 


prob 


(m x — m y ) ± 


b Vn x sl + n y sl Vl/n x + l/n y 


Vn x + 


n,. 


includes {n x — ii y ) 


fQ"nx+n v —2) dt 


We use these results in exactly the same way that we used the 
corresponding results in Sec. 9.8, whenever we have two small 
samples from normal populations and we are willing to assume 
that the variances of the two populations are equal. Of course we 
do not have to make this assumption; we have the alternative of 
including the hypothesis of equal variances in the hypothesis 
that we are testing. In this case, however, we have to remember 
that a significant result does not permit us to assert that the 
population means are not equal. 

There is one fairly satisfactory way, however, to avoid making 
the assumption of equal variances. If the members of the 
sample are paired at random (which necessitates n x — n v or else 
the discarding of some members of the larger sample at random), 
and in each pair one is subtracted from the other, we can use 
Theorem 9.8.1, which states that if X is distributed according 
to <j>(p X) <xl) and Y is independently distributed according to 
4>(tJ-y,^, then/(x - y) = 4>{p,x — p v , a\_f), in which 

O ' 2 = 0* 2 -4- /r 2 

u x—y u x \ u y 

= _ Py, vl- y /ri), remembering that 


TYb% — y Wh# 7Tty 


Therefore f(m x ~ y ) 
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Therefore 


/ 


Ttlx—y (Ms Mj/) 


<K 0 , 1 ) 


Furthermore, /( sl_ y n /<r|_„) = /(%n-i) , in which is the variance 
of the sample differences. Therefore 


/ 


THx—y _ (Mx _ My) 

— y I Th 1 


f(t n - 1) 


Corollary 
prob 


m x _ y + b , xv ■ includes (ji x — fx„) 
Vn — 1 


= J_ b f( tn - 1 ) 


dt 


In many cases there is a certain obvious, nonrandom, pairing 
of our sample values. For example, suppose we have a sample 
of 7 reaction times under condition A and 7 reaction times under 
condition B , obtained on the same 7 individuals. In this case it 
seems obvious that we should pair the reaction times according 
to the individuals, as indicated in the accompanying table: 


Individual 

RT(A ) 

RT(B) 

RT(A) - RT{B) 

1 

.245 1 

.261 

- .016 

2 

.280 

.277 

.003 

3 

.252 

.260 

-.008 

4 

.226 

.240 

-.014 

5 

.317 

.327 

-.010 

6 

.274 

.270 

.004 

7 

.260 

i 

.265 

-.005 


Even if we can assume that RT(A ) and RT(B) are each 
normally distributed, we cannot thereby assume that RT{A) — 
RT{B) will be normally distributed, because our theorem about 
the difference between two normally distributed variates (Theo¬ 
rem 9.8.1) holds only for independent variates, and there is every 
reason to consider RT(A) and RT(B) as nonindependent. If, 
however, we make the separate assumption that the difference 



"STUDENTS" t DISTRIBUTIONS 149 

fw } ~ ?, T(B) 1 nonnall y distributed, then we ean apply the 
l test exactly as described in the foregoing, in which / is the 

— °>T rz htim 01 differen “ s “ d is ^ 

equai to a +a If it is assumed that the difference is nor- 

tribuHon ^ 1S UOt necessar / to assume that the dis- 

whiehThe the variates is normal. In many cases in 

ch the experimenter has a choice in the design of his experi¬ 
ment, the pairing method is considerably more precise for the 
4 i' m ? n f d ;f’. as ^ 18 often considerably smaller than 

I 8^* 111 IS SfiT*! pf.l ir In OA-n-nnrtt ___ I 1 


' smauer man 

hv " , . ' *? s tf lc % incorrect, moreover, to match two samples 

r t (: r g 1 " 1 ' Vlduals or an y other method and then treat the 
data as though they were independent samples. Such a treat- 

ent, besides violating an assumption, can result in the failure 
o o am a significant difference between means when the cor¬ 
rect analysis would have done so. 


EXERCISES 

• f ThC T ie > V Ievel of 10 st ^ents was measured by physiological 

(units^re^arbitrm'y/ 61 “ 6Xaminati ° n - The foU ™ing data show the results 


Student 

1 
2 

3 

4 

5 

6 

7 

8 
9 

10 


Before 


10 

7 

8 

5 

4 

6 
2 

5 
7 

6 


After 


8 

6 

8 

4 

3 
7 
2 

4 

5 

6 


Test the hypothesis that the before and after scores can be considered drawn 
from populations having the same mean. What assumptions d7es your test 

1 i e o „ Is r P< f;! ,le t0 make two t tests or only one in this case? 

... In a study using identical twins, one twin is given a drug and then 

an mtelhgence tat while under the influence of the dfug; the 1//““ 
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given an intelligence test under normal conditions. The data are as follows: 


Pair 

Twin A 

Twin B (drug) 

1 

135 

122 

2 

115 

110 

3 

149 

143 

4 

125 

116 

5 

158 

148 

6 

132 

121 


Test the hypothesis that the drug has no effect on intelligcnce-testscorc. 
What assumptions does your test make? Can you make one or two 

^L^InTgasoline mileage contest, 10 cars of one make have a: mean of 
20 miles per gal and a variance of 36; a sample of 9 cars of a rival make have a 
mean of 30 and a variance of 32. Test the hypothesis that the two makes 
generally give equal mileages. What assumptions does^ your ted; make ? 
Would it be possible to make two i tests or only one with the data obtained? 
Find the .05 confidence interval for the mileage given by the first ma e. 

11 3 A Criterion for the Discarding of Exceptional Observa¬ 
tions and the Testing of a Difference between the Mean of a 
Subsample and the Mean of the Sample. In many sets of 
observations the experimenter or research worker is tempted to 
discard certain observations which lie so far from the mean that 
they appear to have come from a population other than that 
which he is interested in sampling. In some cases there may m 
addition be other reasons for suspecting these observations o 
being irrelevant to the sample. Yet there is always a danger m 
discarding such observations without a statistical criterion, or 
whether he intends to do so or not the research worker may be 
selecting his data in a way which biases the final conclusion. 

A useful statistical criterion for the rejection of atypi 
observations, when sampling from a normal population, can be 
obtained from the following theorem, which we state without 

^Theorem 11.3.1. Consider a sample of size n drawn from 
N(p,a 2 ). Let w - (*< - m)/s, in which x t is the value of the 

1 A proof is given by Cram<§r (3). 
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ith member of the sample and m and s are the sample mean and 
standard deviation respectively. Then 



w Vn — 2 
Vn — 1 — w 2 



Caution must be exercised in the use of this theorem, because 
with a sample of size n (n > 3) the probability that at least 
on e member of the sample will give a value of w Vn — 2/ 
Vn — 1 — w 2 which is significant (by the t tables) at the p 
level is, of course, greater than p, even if all members of the 
sample are actually drawn from the same population. There¬ 
fore, the level of confidence required for the discarding of an 
observation should be low. 

A generalization of Theorem 11.3.1 is as follows: 1 

Theorem 11.3.2. Let mu be the mean of k members of a 
sample of size n drawn from N(p,a 2 ), where 1 < k < n. Let 
w ss (mu — m)/s. Then 


/ 


w Vk(n — 2) 

_ Vn — k — kw 2 




This theorem is appropriate for testing the significance of the 
difference between the mean of a subsample and the mean of the 
sample. 


1 A proof is given by Cramer (3). 
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12.1. Definition and Properties of Bivariate Distributions 

Definition. A bivariate population is a population in which 
each member has two ordered values, that is, it is a class of 
ordered pairs [(x,y), (X,Y)] such that the second member (X,Y) 
of each pair is a member 1 of a set and the first member of each 
pair is the ordered pair of values of that member of the set. 

Definition. A bivariate frequency function is a class of 
ordered pairs [(%$)] such that the second member of each 
pair is a pair of values and the first member is the proportion (or 
proportion density) of the population having that pair of values. 

Examples. 1 . Suppose we plan to toss four ordinary dice, two red and two 
blue. The value of each toss is the ordered pair (x,y) such that x is the sum 
of the red faces turned up and y is the sum of the blue faces turned up. The 
fr.f. of the population of all possible ways (there are 6 4 = 1,296 in all) in 
which the four dice can land is a bivariate fr.f., in which/(2,2) = H 2 96 > 
/(3,2) = K 2 96,/(3,3) = «296,/(7,7) - 3 %296, etc. 

2. The distribution given by f(x,y) = ( x 2 + y)/32 for x — 0, 1, 2, 3 and 
y = 0, 1 is a bivariate distribution. This distribution is given in Table 12.1.1, 
in which each entry is a proportion f(x,y). 

Table 12.1.1. Discrete Bivariate Distribution 


hi 2 

Hz 

%2 

10 A2 

0 

H 2 

^2 

%2 

0 

1 

2 

3 


x 


1 The reason for using a pair of letters to denote a member will become apparent 
later in the discussion. 
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3. The continuous distribution given by f(x,y) = 3 x 2 y + x for 0 < x < 1 
and 0 < y < 1 is a bivariate distribution. The density at the point (1,1), 
for example, is 4. 

4. Suppose we have a population of size N, in which there are JVi A’ s, N*B’ s, 
and (= JV — N\ — iV 2 ) C’s. What is the fr.f. of the population of all possi¬ 
ble samples of size n, the value of each sample being the pair (x,y) in which x 
is the number of A’s and y is the number of B’ s? The student will recognize 
this as the hypergeometric case with two variables, and should be able to prove 
for himself that 

pNipNipN-Ni-Nz 

f(*>y) = 


in which C% is understood to be 0 if b > a or b < 0. 

5. Consider a population as described in Example 4. What is the fr.f. of 
all possible sequences of n draws, replacing each member after drawing it? 
The student can prove as an exercise that 


f(x,y) = P 


n 

x,y,n—x—y 



i—x—y 


in which P Xtytn __ x _ y is the number of discernibly different permutations of n 
objects of which x are of one kind, y are of a second kind, and n - x — y are 
of a third kind. The student should also be able to prove that 


pn 

1 x,y,n—x~y 


_n!_ 

xlyl(n — x — y)l 


Obviously ^ f(x 9 y) = 1 for a discrete bivariate distribution. 

all x, y 

Note that this summation can be performed by holding y con¬ 
stant and summing for all values of x, then taking another value 
of y and again summing, etc., until all pairs of values (x,y) have 
been taken. This process can be represented by the double 
summation ^ ^ f(x,y). 

all x all y 

For the continuous case, consider the probability spread out in 
the xy plane so that the probability density at a given point in 
the plane is f{x,y). This means that the probability is spread so 
that 


lim 

Ax, Ay— 


/ prob in rectangle Axy 
o \ Ax Ay 



This implies that to approximate the amount of probability in 
a given rectangle we can take the area, Ax Ay, times the density 



154 


BASIC STATISTICAL CONCEPTS 


of any point within the rectangle, the approximation improving 
as we take smaller and smaller sizes of Ax and Ay. Suppose that 
we wish to find the probability in the rectangle a < x < b, 
c < y < d. We can divide this rectangle into n equal horizontal 


y 



J=li=l 

c < y < d) y with each point in the xy plane having the density f(x,y). 

intervals each, of length Ax and. tu equal vertical intervals each of 

m n 

length Ay and form the sum £ £ Ax Ay, in which x< is 

j = 1 i — 1 

any value of x in the ith horizontal interval and yj is any value of 
y in the jth vertical interval. We then have 

m n 

lim (y y f(Xi,y } ) Ax Ay) = prob (a < x < b, c < y < d) 

We abbreviate this limit as J[ d f*' f(x,y ) dx dy. In actually per- 

forming the integration, we integrate first with respect to x, 
treating y as though it were a constant, and then we integrate 
with respect to y. As an example, we have 


rd rb [d / h2 fl2\ 

/ j (3 x 2 y + x) dx dy = I (b s y + ^- aS y ~~ y J 

_ b z d 2 , b 2 d a z d 2 

2 2 2 2 2 2 2 2 
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Obviously for any continuous fr.f. we have 


UnJ^y) dx d v = 1 

in which R x and R v indicate the entire range of x and y respec¬ 
tively. Example 3 above was the continuous bivariate dis¬ 
tribution given by f(x,y) = 3 x 2 y + x ior 0 < x < 1, 0 < y < 1. 

The student will find that (3 x 2 y + x) dx dy = 1. 

In any bivariate distribution f(x,y) the distribution of X is 
called the marginal distribution of X and that of Y is called the 
marginal distribution of Y. In the case of a discrete bivariate 
fr.f. the marginal distribution of X is obtained by summing 
f(x,y ) for all values of Y, holding the value of X constant, 
and doing this for each value of X. For example, suppose 

the X values are x h x 2 , ... , x k . Then f(x i) = X /(*i,y); 

ail V 

f( x a) = 2, f( x 2 ,y)', . . . ; f(x k ) = ^ f{x k ,y). The marginal dis- 

all V allj/ 

tribution of Y is obtained analogously. 

In the case of a continuous bivariate fr.f., the marginal dis¬ 
tribution of X is obtained by integrating f{x,y) on y, treating x 
as though it were a constant; that is, 


f( x ) = ) dy 

Examples. 1 . Let a discrete bivariate distribution be given by 


/( 0 , 0 ) = .1 /( 1 , 0 ) = .1 

/( 0 , 1 ) = .2 /( 1 , 1 ) = .1 

/(0,2) = .3 /(1,2) = .2 

The marginal distribution of X is given by 
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The marginal distribution of Y is given by 


m = 



= .2 


i 

/(l) = £ /(*, 1) = -3 

x = 0 
1 



2 . The marginal distribution of X in the discrete bivariate distribution given 
as Example 1 of a bivariate distribution is simply /(2) = bs6; /(3) = 

/(4) = % 6 ; /(5) = /(6) = He-,' fO ) = /(«) = etc. (The 

student should obtain this as an exercise.) 

3 . Consider the bivariate distribution given by f ( x , y ) = 3 x 2 y + x when 
0 < x < 1; 0 < y < 1. Then 



3x 2 , 

—-f- X 

Z 

Note that 

/>) dx = fo (t + *) dx = (1 +1 

it- 

Similarly, 

Ay) = ^ (3 X*y + x)dx= (x 3 y + |-) |* = 

-y + \ 

Note that 

J 0 X Ay) dy = ft (y + dy = (| + |) 

I 1 = l 

10 

For the 

marginal distributions, we have the usual moments 


and other parameters or statistics. We also have parameters 
and statistics depending upon the joint distribution. For 
example, we have the moments E[(X - Ci) k '(Y - c 2 ) fe ], in 
which Ci, c 2 are any constants and hi, /c 2 are any positive integers 
or 0 (note that when one of the k’s is 0 we have a moment of one 
of the marginal distributions). In the discrete case a moment is 

^ ^ (x — Ci) k '(y — c 2 ) fc2 ; in the continuous case, 

all y all x 


/ K f R (X - Ci) 7;i (y - c 2 )* a dx dy 
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EXERCISES 

12.1.1. A bowl has 3 red, 2 white, and 1 black chip in it. 

a. . Find the distribution of the population consisting of all possible samples 
of 3 chips, each sample having as values the number of red chips and the num¬ 
ber of white chips. 

b. Find the marginal distribution of the number of red chips, also of the 
number of white chips. 

c. If X is the number of red chips and F is the number of white chips in & 
sample, find E\[X - E{X)][Y - E{Y)}\. 

12 . 1 . 2 . A bowl has 2 red, 1 white, and 2 black chips in it. 

a. Find the distribution of all possible samples of 2 chips, each sample 
having as values the number of red chips and the number of white chips. 

b. Find each of the marginal distributions. 

c. If X is the number of red chips and F the number of white chips in a 
sample, find E{[X - E{X)][Y - E(Y)\\. 

12 . 1 . 3 . Let f{x,y) = (3* + y)/ 2 when 0 <x< 1;0<//<1. Find 

a. the proportion of the population such that 0 < x < .5 and 0 < u < 5 

b. E(X) and E(Y) “ 

c. variances of X and F 

d. E{[X - E(X)][Y - E{Y)]\ 

12.2. Regression. Consider a bivariate distribution f(x,y). 
Choose a particular value of X, say x\. Now consider the set of 
probabilities (or probability densities) f(x h y). It is convenient 
to treat this set of probabilities (or probability densities) as 
though it were itself a fr.f., which we can do easily enough if we 
make the sum (or integral) of th ef(x 1} y) equal to 1 by multiply¬ 
ing each/(xi,?/) by a constant. Since 

T f(xi,y) for f f(x h y) dy 1 = f( Xl ) 

all y L v J 

the necessary constant is simply l/ffo). We call the prob¬ 
ability (or probability density) f(x,y)/f(x) the conditional prob¬ 
ability (or probability density) of y, given x. The conditional 
probability (or probability density) is written f(y\x). 

Definition. The conditional probability (or probability den¬ 
sity) f(y\x) is defined as f(x,y)/f(x), that is f(y\x) = f(x,y)/f(x). 
The conditional fr.f. is the class of ordered pairs \J(y\x),y]. 

Examples. 1. Consider the discrete bivariate distribution given by 
f(x,y) = (x 2 + 2/)/32 for a; = 0, 1, 2, 3 and y = 0, 1 (Table 12.1.1). Since 
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f(x) = M 2 for x = 0, 

fM0 )= fm = Q + v)M = y 

m } /(0) 1/32 * 

1 

Thus /(0|0) = 0,/(1|0) = 1. Note that ^ f(y\0) = 1. Similarly, 

y = 0 

f(y\i) = f(i,y)/m = (1 1/32 ~ = (1 + y)/z 

1 

Thus /(0|1) = H; /(1|1) = %■ Again note that ^ = L More 

y = 0 

generally, 

/V (x 2 + y ) /32 

f(y\%) = f(x>y)/f( x ) = f( x >y) / 2, = * 2 /32 + (* 2 4- i)/32 

Qr 2 + y)/32 = x* + y 

^ 2 (2* 2 + 1) 2z 2 + 1 

From the latter we obtain f(y\0) = y; f(y\l) = (1 + lj)/% as above, also 
/(y|2) = (4 + y)/9;f(y\3) = (9 + y)/ 19. Note that 


i 

V x 1 + y _ j 
L,_ 2a; 2 + 1 


:,y) = 3z 2 2/ + a; 


for 0 < x < 1 and 0 < y < 1. We have 


/( 15 ) = = - 7 % + -5 ; = - 75 V ± A _. 

j* /(.5, y) dy (.75y + .5) dy (,75y 2 /2 + .by) q 

° _ .7by + .5 _ 1 .5y + 1 

1.75/2 1.75 

Note that 

f 1 !•% +1 dv _ j_ (W + = — (— +1) = 1 

Jo 1.75 dy 1.75 V 2 ^Vlo 1.75 V 2 T / 

More generally, 

, . f(x,y) _ 3:r 2 y + a; = 3z 2 y + x 

V f* f(x,y) dy £ (3 x*y + x) dy + xy) |* 

_ 3 x 2 y + x _ x(3x y + 1) _ 6 xy + 2 
~ 3a; 2 , .. /3a; , A 3* + 2 


*(! + ■) 


from which we obtain 
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f(y 1-5) 

as above. Note that 


= 6 (- 5 )y + 2 = 3y + 2 _ 1.5 y + 1 
3(.5) + 2 3.50 1.75 


/: 


6 xy 2 
3x + 2 


dy 


1 


3x + 2 


(¥+ 2 »)i:= 


1 


3 * ■[“ 2 


(3® + 2) = 1 


If the conditional distribution of F is equal to the marginal 
distribution of F for all values of X, then we say that Y is inde- 
pendent of X. 

Definition. Y is independent of X if and only if f(y\x) = f(y) 
for all values of X . 


Note that, if f(y\x) = f(y), then = f(y) and therefore 

f(x>y) = f(x)f(y). Further, if J(x,y) = f(x)f(y), then 

/(»l*) =/(?/) 

Therefore, a necessary and sufficient condition for the independ¬ 
ence of F is that/(*,y) = f(x)f(y). It can easily be shown that 
X is independent of F if and only if F is independent of X. 
Therefore, X and F are independent if and only if 


K x ,y) = f(x)f{y) 

We can define moments and other statistics (or parameters) 
for conditional fr.f.’s in the same way as for other fr.f.’s. It is 
useful in many applications to consider the way in which the 
mean (or some other measure of central tendency) of the condi¬ 
tional distribution of F varies as X varies. The curve which 




Fig. 12.2.1. Mean regression curves. 
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the mean (or other measure) describes is called a regression curve 
and is said to represent the regression of Y on X. The regression 
of X on Y is defined analogously. Letting m v]x and m x}y denote 
the means of the conditional distributions of Y and X respec¬ 
tively, the mean regression curves are illustrated in Fig. 12.2.1. 


Examples. 

pie 1 above, 


1. For the discrete bivariate distribution mentioned in Exam- 

we have 


YYly\x 


/(l I*) 


1 

yf(y\Y) 

y = 0 
£ 2 + 1 



2+ + 1 


= 0/(01®) + V(i|®) 


Thus 


nty | 0 

m„|i 


m „|2 

m v 13 


0 T 1 _ i 
0 + 1 
1 + 1 _ 2 
2 + 1 3 

4 + 1 _ 5 

8 + 1 9 

9 + 1 _ 10 

18 + 1 19 


Similarly, 


3 

m x \y = ^ xf{x\y) etc. 

X = 0 


2. For the continuous bivariate distribution mentioned in Example 2 
above, we have 


Wly\x 


/o' * ■ l: » ('1VI) ^ 

- STF2 /: ^ + 2 “> iv ” STH (2 ’ y ‘ +,,) 


Thus 


m y | 0 = 

m,y\.i = 2/3.5 


^<7 


3x + 2 


etc. 


1 {2x + 1} - InH 


The student should plot this regression curve as an exercise. 

Suppose that, knowing the value of X, we wish to estimate the 
value of Y. By using m y \ x as our estimate of y for all values of 
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X we can minimize the mean of squares of “errors,” defining 
“ error ” as the deviation of y from the estimation. This follows 
from the statement in Sec. 4.5 that for any distribution the 
second moment about a point c is a minimum when c is the 
mean. This important minimum property is summed up by 
saying that, of all curves that can be drawn in the xy plane, the 
mean regression curve minimizes the mean of squares of devia¬ 
tions from the curve, the deviations being measured along the y 
axis when the curve shows the regression of Y on X and along 
the x axis when the curve shows the regression of X on F. 

Examples. 1. In Example 1 above, we found the four means defining the 
mean regression of Y on X. The mean squared deviation from m y[0 in/(y|0) 
is of course 0, since all the probability in this conditional distribution is 
concentrated at one point. The mean squared deviation from m y \i inf(y\l) is 

l i 

^ (y - m y]1 ) 2 f(y\l) = ^ (j/ 2 - 2 ym v \ x + wi^i)/(j/|l) 

= <|i/(0|l) + (1 - 2 m yll + «4)/(l|l) = (HYiVs) 

+ (1 -H + %)(%) “ + 2 Ai=% 

The other two mean squared deviations are found analogously. The 
student can choose any point other than m y \ x and verify for himself that the 
mean squared deviation from this point is larger than the mean squared 
deviation from m y \ x . 

2. In Example 2 above, the mean squared deviation is given by the integral 
Jq (y — m y \ x yf{y\x) dy. For example, 

f* (V — m v \ 0 )J(y\0) dy = (y - %Y(l) dy = J* (y 2 - y + H) dy 

= O 3 /3 - y*/2 + y/4) | 1 q =H-V 2 + H = Ha 

The student should have recognized that the mean squared 
deviation is simply the variance of the conditional distribution. 
For computational purposes, note that 

^ (y - m yx Yf(y\x) = ^ (y 2 - 2ym ylx + m 2 u )f(y\x) 

all y all y 

= J^yJiylx) - 2m y[x £ yf(y\x) + m 2 y]x ^f(y\x) 

= £ y 2 f(y\x) - 2m y \ x (m y \ x ) + ml\ x (X) = ^ y 2 f(y\x) - ml\ x 
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Similarly, 

f Ry (y ~ m v\*) 2 f(y\ x ) dy = f Ry (y 2 - 2m y]x y + ml\ x )f(y\x) dy 

= fn s y 2 f(y \ x ) d y - 2m v \* yf(y\ x ) dy + m 2 vlx j R j(y\x) dy 

= f Ry y 2 f(y\x ) dy - m\\ x 

This is basically the same computational theorem as that shown 
in Sec. 4.3. We have shown that 

E[Y - E(Y)Y = E(Y 2 ) - [E{Y)Y 

for a conditional distribution. 

EXERCISES 

12.2.1. Let fix,y) = 3(x 2 + y 2 )/ 2 when 0<x<1;0<|/<1. Find 

a. fix) and/(y) 

b. f(y\x) 

C. Wly\x 

12.2.2. Let f(x,y) = x 2 + x?//12 + y 2 when 0<x<l;0<y<l. Find 

a. fix) 

b. fiy\x) 

12.2.3. Let fix, y) — (x + 3x + 2/)/18 when x = 0, 1 and y = 0, 1, 2. Find 

«• /O) 

6- fiy\%) 

C • W'j/ja; 

12.3. Linear Regression. Of particular interest are those 
bivariate distributions in which the mean regression of Y on X is 
linear , that is, m y \ x = ax + b, in which a and b are constants. 
For these distributions a moment of considerable importance is 
the moment with c\ = E(X), c 2 = E(Y), ki = 1, k 2 = 1, that is, 
the moment E{[X - E(X)][Y - E(Y)]\. This moment is 
called the product moment or the covariance (Cov). For compu¬ 
tational purposes, note that 

E{[X - E(X)][Y - EiY))} 

= E[XY - YE{X) - XE{Y) + E(X)E(Y)] 

= E(XY) - E(X)E(Y) 


BIVARIATE DISTRIBUTIONS 


163 


When X and Y are independent, E(XY) — E(X)E(Y) (this 
should be proved as an exercise); thus Cov xy = 0. 

The ratio of the covariance to the geometric mean of the two 
variances is called the product-moment correlation coefficient 
(p-mcc); that is, 


p-mcc = 


Cov* 


VVar* Var,, 


We denote the parameter p-mcc by p and the statistic p-mcc 
by r. The p-mcc is defined for all bivariate distributions, 
whether the mean regression is linear or not (but see below). 
The maximum value of the p-mcc is 1 and the minimum value 
is —1. An absolute value of 1 is obtained when and only when 
all points of the distribution lie on the mean regression line and 
the mean regression is linear. To show that the maximum 
absolute value is 1, we note that, as the square of any quantity is 
positive or zero, 


E[cr v (X — p, x ) + <r x (Y — p y )Y > 0 
Squaring, we obtain 


E[<rl{X - p x y ± 2cr x a y (X - p x )(Y - p v ) + *l(Y - p y ) 2 ] > 0 
crlE{X — PxY ± 2a x <r v E[{X — p. x ){Y — p v )] 

, + alE(Y — p v y > 0 

crfol ± 2a x cr y Cov xy + > 0 

2 crl<rl > ±2<r x cr y Gov xy 
cr x cr y > + Cov x „ 

1 > ± = + p 
0 " y 

An analogous proof holds for r, the sample p-mcc. 

If X and Y are independent, then p-mcc = 0, but the converse 
is not true, that is, p-mcc may be 0 even when X and Y are not 
independent, if the mean regression curve is not linear. An 
example of a bivariate distribution with p-mcc = 0 and with X 
and Y clearly not independent is given in Table 12.3.1. 

The student can verify that p-mcc = 0 in this distribution, 
yet for no point (x,y) does f(x,y) = f(x)f(y). In this distribu- 
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tion the mean regression of Y on X is nonlinear, as the student 
can verify. As a matter of fact, in all cases in which the mean 
regression is linear, the p-mcc is 0 if and only if X and Y are 

Table 12.3.1. Bivariate Distribution with p-mcc = 0 and X and 
Y Not Independent 


3 

0 

.3 

0 

y 2 

.1 

.2 

.1 

l 

.1 

.1 

.1 


l 

2 

2 


x 


independent. The p-mcc is an adequate measure of relationship 
only if the mean regression curves are linear. 

Examples. 1. Consider the discrete bivariate distribution given by the 
following table [entries SLief(x,y)]. 


He 

He 

Ks 

He 

He 

0 

He 

He 

0 

He 

0 

0 

0 

1 

2 


x 


The covariance of the above distribution is obtained as follows: 

3 2 

E(XY) = ^ ^ xyf(x,y)'= (1)(1) ^ + (1)(2) + (1)(3) ^ 

y= 0 x=0 

+ <2)(3 >k = ! 

B(X) - (1) 1 + (2) A - i 
W) - + (2) :§ + TO- i 



Therefore 

Cov*„ = 2? _ m (25\ = 
" 16 \16/ \16/ 

Further, 

Var 
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127 

16 2 


E(X>) - [X(X)] 2 = (1) A + (4) 2 

lb 16 

Var, = (1) A + (4) A + ( 9 ) 5 


Therefore 


P = 


1*6 

CWzy 


16 


VVar* Var y 


-d) 

127/16 2 


d)'= 


127 

16 2 


351 

16 2 


Jgi V"(127) (351) 


- Vjg - .eo 


The sliidciit should verify that the mean regression of F on X is linear 
while the mean regression of X on F is nonlinear. For this particular distribu- 
fe)n therefore p ,s less suitable if we are interested in the mean regression of 

X o n ^ n lf We arG mterested m the mean regression of Y on X 
2 . Consider the distribution discussed in Example 2 in the three sets of 
examples given in Sec. 12.2. This is the distribution given by 

f(x,y) = 3 x 2 y + x 

for 0 < a < 1 and 0 < y < l. We have 

E(XY) — xyf(x,y) dx dy = (‘Sx s y 2 + x 2 y) dx dy 

-r(¥+D*-($+?)i: , 

Em - /: * - /;, (f +,) * . /; (!' + dt 

_ (' >>x ' J_ I 1 _ 

V 8 ~ l ~ 3/ o “ 


1 , 1 = _5 
4^6 12 


Em - // ,m <%. /; „(„ + ■) 4, = (|- + 1) i;. 1 + 1 


3 , 1 = 17 
8 3 24 


_7_ 

12 


^2 - OK 4 XH 2 ) = (120 - 119)/288 


Therefore 

Cov*„ = E(XY) - E(X)E(Y) 

Also 

E(X*) = f* xj(x) dx = jf 1 Q*. + dx 3+|. 


~ 1^8 8 


n 

20 


Var* = E(X 2 ) - [X(X)1 2 = 11 - fiZV - 
' 20 \24/ “ 


139 

(24 2 )(5) 




We had found previously that m v{x = (2x + l)/(3z + 2), which is non¬ 
linear. Thus for this distribution, if we are interested in the regression of Y 
on X, p is not the best measure of relationship; however, as the student can 
see by plotting (2x + l)/(3x + 2) for several points between 0 and 1, the 
approximation to linearity is quite good. 

We state without proof 1 that in any bivariate population with 
linear mean regression, the constant a = p(o’ y /<r x ) and the con¬ 
stant b = Hy- p(<ty/<r*)Px, SO that the mean regression equation 
becomes m y \ x = p(a y /a x )(x - p x ) + p v - Note that the point 
(Px,Py) satisfies this equation, that is, the equation passes 

through this point. _ 

It is customary in many applications to assume that in the 
bivariate population being sampled the mean regression of. Y on 
X is linear. In accordance with this assumption, a prediction 
equation of the form y p = ax + b is found which predicts a 
value y p for each value of X such that the total sum of squares 

of deviations £ (y t - y P iY is minimized. We state without 

proof 1 that, for a sample of size n, this sum of squares is a mini¬ 
mum when 

a = r ^ b = m v — r ^ m* 

S x S * 

Therefore the prediction (regression) equation becomes 


y p = r ^ (x - m x ) + m y 

s x 

1 A proof is given in the Appendix, Sec. B.25. 
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In this equation r is the sample p-mcc and m X) m v , s x , s v are the 
sample means and standard deviations respectively. Note that 
when x = m x , y p = m„, that is, the point {m, X) m y ) lies on the 
prediction curve. 

Note that 

X (Vp* ~ =Y< Vvi ~Yj Vi 

= r -f \ Xi — nr ~ Tn x + nm v — nm v = 0 

Since the mean error is zero, the standard deviation of the errors 
can be taken as a measure of adequacy of prediction. The 
standard deviation of the errors is called the standard error of 
estimate (s e ), that is, 


£ (Vpt ~ Vi) 2 

si EE i=l -- 

n 

The student can prove as an exercise that si = sf(l - r 2 ). 

, Exam Ple- Suppose we have drawn a sample of 100 from a bivariate popula¬ 
tion in which we assume the mean regression curves are linear. Our sample 
yields the following statistics: 


m x = 101.2 m y = 53.7 
sl= 6.25 s 2 = 4.00 

Covxy = 3.50 

Then r = = .70 

V25 

And y P = .70 (x - 101.2) + 53.7 

= .448a: - 8.362 


This regression equation can be used in the following way. Suppose we 
now observe the value of X for a new member (X,F) from the same bivariate 
population. We can predict the value of F by substituting the value of X in 
the ab ove equation. The standard error of estimate in our example is 
s e = V4.00(1 - .49) = 1.43. 
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EXERCISES 

12.3.1. Find the regression of 7 on X for the bivariate discrete distribution 
given by the following table: 


5 

00 

& 

00 

SR 

00 

y 4 

Ms 

00 

SR 

00 

CO 

CO 

SR 

00 

X 

00 


0 

1 

2 


x 

Is the regression linear? 

12.3.2. In a study of the relationship between performance-on-the-job 
scores (7) and selection-test scores (X), a sample of 400 pairs (X,7) yields 
the following statistics: 

m x = 0 ni y == 10 

4-25 4 = 9 

r — .50 

Find the linear regression of 7 on X. 

12.3.3. Find the regression of 7 on X for the continuous bivariate distribu¬ 
tion given by f(x,y) - 2 x/y for 0 < x < 1, 1 < y < e. Is the regression 

linear? 2 2 

12.3.4. Prove that, for a bivariate population, <4+* = 

and oi= <7* + <r 2 - 2pcr x <Ty. _ 2 2 , 2,0 j 

12.3.5. Prove that, for a bivariate sample > “T f 2 Ts x Sy an 

sl-j, = 8* + - 2rs x s v . 

12.3.6. Prove that linear transformations of x and y into w and * respectively 
do not change the size of the p-mcc; that is, that t wz — r xy (and p wz = Pxy )• 

12.4. The Correlation Ratio. As we have seen, the p-mcc is 
an adequate measure of the relation between two variates only 
when the mean regression curves are linear. A measure of the 
relation which is. adequate in some cases of nonlinear, as well as 
linear, regression is the correlation ratio. 

Consider a regression curve y p = m y]x . A measure of how 
closely this curve fits all the points in the bivariate distribution 
is simply the variance of the errors, or E[(y P — F) 2 ]. In con¬ 
sidering how adequate a fit this is, however, we should take into 
consideration the variance of F itself, which we can do by divid¬ 
ing the variance of the errors (vertical deviations) by the vari¬ 
ance of Y. This ratio gives the correlation ratio eta which 
is defined as follows. 



Definition 
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VU « 1 


E[y P - Y) 2 ] 


= 1 



in which y p = m y]x . 

When the fit is perfect, that is, all the points lie on the regres¬ 
sion curve y p = m ]x , then = 0 and ^ = 1. When the 
points are scattered around the regression curve just as much as 
they are m the marginal distribution of y, we have <r 2 = ^ 

and id = 0. vr ~ v v 


For any bivariate distribution, 0 < p-mcc 2 < rj 2 < 1. Only 
when the mean regression curve is linear does p-mcc 2 = v 2 . 


EXERCISES 

following table : P ^ f ° r ^ diSCrete bivariate distribution given by the 


2 

0 

.3 

0 

y i 

.1 

.1 

.1 

0 

.2 

0 

.2 

0 1 2 

X 

Why is r} yx so much larger than p? 

12.4.2. Find p and rj yx for the following distribution. 

3 

0 

.1 

.2 

y 2 

0 

.2 

.1 

1 

.3 

.1 

0 


0 

1 

2 


X 


Why are p and rj yx almost the same size? 

12.4.3. Whereas p xy = p yxt is it true in general that 

12.4.4. If in a discrete bivariate distribution there 
why must p = rj y J 


Vvx — Vzy? 

are only 2 values of 


X, 



^70 BASIC STATISTICAL CONCEPTS 

12.5. The Sampling Distribution of r. Often we wish to test 
the hypothesis that in the population sampled the p-mec p has a 
specified value, for example, 0. The sampling distribution of r 
depends not only upon the size of p, but also upon the form of 
the distribution of the population. We shall concern ourselves 
only with one particular form of bivariate distribution, called 
the bivariate normal distribution. This distribution is defined as 
a distribution given by a rule of the form 

in which k, ai, a^, as, Ci, and C 2 are constants, k > 0, ®i > 0, 
o 3 > 0, and aia 3 > a 2 2 . 

We state without proof 1 that 

, _ Vaia3 — nj 

y. x = C 1 Mj/ = C 2 — ' 2 7T 


a 


2 

x 


as 

aia 3 — at 


Co Y x y 


ai 

a\a 3 at 
a 2 

«ia 3 — a\ 


Therefore 


Covlj, = _a\_ 
alcry a\a 3 


We can therefore write the bivariate normal distribution m the 
following form: 


f(x,y) = 






2(1 -p 2 ) 


- (X-JL& _ 2 _?_ (x _ (y - +l V 1 

az 1 GzVy 




When the rule is written in this form, it is easily seen that 
f(x,y) is a maximum at the point (ix x ,ix y ) (see Sec. 9.2). It can 
also be shown 2 that, if points of equal probability density are 
connected by a curve in the xy plane, the curve is an ellipse for 
any probability density between zero and the maximum 
These ellipses of equal probability density are illustrated in lg. 
12.5.1. Many populations sampled in applications are believed 
to have approximately this distribution. 


1 Proofs are given by Cram6r (3) and Wilks (12). 

2 A proof is given in the Appendix^ See. B.26. 
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We also state without proof 1 that for a bivariate normal dis¬ 
tribution the two marginal distributions are each normal and, 
further, all conditional distributions are normal. 


9 



Pig. 12.5 1. Ellipses of equal probability density for a bivariate normal 
bution, showing the mean regression lines. 


distri- 


The sampling distribution of r for a bivariate normal distribu¬ 
tion is given by a complicated expression which we do not repro¬ 
duce here; a remarkable property of this sampling distribution 
is that it depends only upon p and n. We state without proof* 

that the transformed variate z m \ log, L+I is distributed 

approximately according to 0 Q log, l±£, —L_), in which p 

is the population p-mcc and n is the size of the sample. Remem- 
ering that log 1=0, note that when p = 0 the approximate 
distribution of z is normal with mean 0 and variance l/(n - 3).* 

Corollary. Let £ . 1 bg, TW » ~ t is di3 . 

1 1 p Vl/(w — 3) 

tnbuted approximately according to <£(0, 1). 

1 A proof is given in the Appendix, Sec. B.26. 

2 A proof is given by Cramer (3). 

* When p = O r is also approximately normally distributed, with mean 0 and 
variance 1 /{n — 1.) 
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Corollary 

prob (s ± b g includes ^ = J_ b <^>(0,1) dx 

Corollary. Consider two samples of sizes n i and n-> respec¬ 
tively, each drawn independently from normal bivariate popula¬ 
tions. Then /(zx - z 2 ) is approximately 

* ( €l “ * 2 ’ ^3 + ^3) 

Therefore ( Zl ~ ~ ^ ~ ^ is distributed approximately 

\ni - 3 + n* - 3 
according to 4>(0,l). 

Tables of the r-to-z transformation and the z-to-r transforma¬ 
tion can be found in Appendix C. With these tables and the 
above corollaries and theorems, the student should be able to 
test hypotheses about p and Pi — P 2 and find confidence inter¬ 
vals for p when samples from bivariate normal populations are 
available. 


Examples. 1 . A sample of 84 from a bivariate population assumed to be 
approximately normal yields an r of .70. 

a. Is it reasonable to suppose that p = 0? If p were 0 the distribution of z 
would be approximately N(0,Hi). Transforming r to z, we obtain z = .87 

for r = .70 (from the Appendix). Then C.R. = ^== = 7^ = 7 - 8 - Tlie 


hypothesis is highly untenable (p < .001). . 

b. Is it reasonable to suppose that p = .30? We transform p to £ obtain- 


ing £ — .31 for p = .30. Then C.R. 


g — in 
Hi 



= 5.0. 


Therefore 


we can also reject the hypothesis that p = .30 at the .001 level of confidence. 

c. What is the .01 confidence interval for p? To find this interval we first 
find a confidence interval for £ and then transform the two end values of £ into 
p. The confidence interval for g is .87 ± 2.58(H) or .58 to 1.15. The confi¬ 
dence interval for p is therefore .52 to .82. Note that r is not at the mid-point 


of the confidence interval for p. 

2. A sample of 53 is drawn from each of two supposedly normal bivariate 
populations. The sample r*s are n = .80 and r 2 = .50. Is pi = p 2 ? To 
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C.R. = ~ _ (zi — z 2 ) - 0 _ 1.10 - .55 .55 

VKo + Ho VHo VHi ’ = H = 2 75 


75 0 


Thus we can reject this hypothesis at the .01 level of confidence. 

in I?' 6 ’ 1 ? e f Cat , ter P i 0t A simple device of g^at usefulness 
in the understanding of bivariate distributions and in the inter- 

pr e a ion o measures of relation is the scatter plot . A scatter 
plot is made by placing a mark in the xy plane for each member 
of the sample, locating the mark at the value of that member. 
Suppose, for example, that the Z values (or class intervals) are 

nlofn^d'o i, ’ ^Y™- 1 the J ValUeS are Vh V2 ’ ‘ ’ V- A scatter 

plot made by placing a dot at the point for each member 



Sear 12 ' 6 ' 1 ' A ^ “ Which th ® m ° an re S rfission curves are approximately 


(Z 8 , F<) might look like Fig. 12.6.1. We can tell at a glance from 
is scatter plot that the mean regression curves are at least 
approximately linear, and therefore we are justified in using the 
p-mcc as a measure of relationship. The research worker should 
not rely on the numerical value of r, however, but should build 
up a set of visual associations of scatter plots with the cor¬ 
responding numerical values of r, because the relationship is not 
a simple one. The formula s* = s 2 ( 1 - r 2 ) indicates that the 
anance about the linear regression curve is 1 — r 2 of the vari- 
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ance of Y. Values of «.* for a few values of r are shown in the 
following table: 


Thus the difference between r’s of .80 and .90 is much greater 
than that between .10 and .20 in terms of reduction m s. The 
square of r gives the proportion by which s 2 is reduced m obtaining 

^ Since the variance is more difficult to interpret than the stand¬ 
ard deviation, it is often convenient to think in terms of 
than Values of for a few values of r are given m the follow- 

ing table: 


Since s e = s y [l - (1 - Vl - r 2 )], the p roportion by which 
Sy is reduced in obtaining s e is 1 - Vl - r\ As 

sin x = Vl — cos 2 x 

a table of trigono metric functions is convenient for finding 
Vl - r 2 or 1 - Vl - r 2 . 

As indicated in Sec. 12.5, if r is used to test a hypothesis about 
p then it is necessary that the sample be drawn from a normal 
bivariate distribution. As stated in Sec. 12.5, the marginal dis¬ 
tributions of a normal bivariate distribution are themselves 
normal; therefore, each of the sample marginal distributions 
for armroximate normality and tested, it 
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sufficiently large, by the x 2 test. If the mean regression is 
linear and each of the marginal distributions appears to be 
approximately normal, it is probably fairly safe to assume that 
the population sampled is a normal bivariate distribution, and 
to test hypotheses about p. 

Frequently r is interpreted in terms of the adequacy of predic¬ 
tion, that is, in terms of the standard error of estimate's,. 
Obviously to state a single figure for the adequacy of prediction 
assumes that the variance of errors is equal from one value of X 
to another.. Since in a normal bivariate distribution all condi¬ 
tional distributions of Y have the same variance [oyil — p 2 )]* 
and since the mean regression of Y on X is linear, it follows that 
with a normal bivariate distribution the assumption of equality 
of variance of errors is satisfied. 

The interpretation of s e depends in part upon the form of the 
distribution of errors. If the distribution of errors is normal, 
the table of N (0,1) can be used in statements about what propor¬ 
tions of errors fall within certain limits. For any sample the 
distribution of errors should be found and examined for nor¬ 
mality before statements of this kind are made. In any normal 
bivariate distribution, the distribution of errors (that is, devia¬ 
tions from the mean regression curve) is normal. 

12.7. Computation of r. For computational purposes, note 
that the following sums from a bivariate sample provide all the 
information needed to calculate r: n, Xx, Xy, Xx 2 , Xy 2 , Xxy 
First we calculate m x , m y , s x , and Then we obtain 

Cov xy = ~ m v) = _ m x Xy 

n n n 

. nm x m y _ Xxy 

' n ~ n Tn x m y m y m x -f- m x m y 

Next we find r = Cov/ s x s v . Since we usually want to calculate 
viy, s x , and s y anyway, the order indicated is usually the most 
efficient. If for any reason we want to obtain r directly, or to 
check our calculation, note that 


myH,x 

n 

Xxy 

■■ - m x m y 


* The property of equal variances of conditional distributions 

sceaasticity. 


is called homo- 
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2%y 

n 


n'Z xy — SasZy 

Ttl X Ttly ^2 


and 

S x Sy 



(2s)f\ /S^ 

n 2 ) \ n 


(?y) 2 \ 
« 2 / 


= - (X x )1[ n X I '‘“ G y ) 2 ] 


Therefore 


wSa:y — SzSy_ 

r = V[n2x 2 - (2x) 2 ][n2y 2 - (Zy) 2 ] 


If the size of the bivariate sample is extremely large, it may be 
convenient to divide each of the marginal distributions into class 
intervals, considering each score as falling at the mid-point of 
the interval, as described in Sec. 4.8. It may further be con¬ 
venient to transform each value by a linear transformation, the 
student was asked to prove (Exercise 12.3.6) that the trans¬ 
formed variates yield an r equal to that of the original variates. 


EXERCISES 

12 7 1 In a study of the relation of scores on an aptitude test to productiv¬ 
ity (i'n'a factory) after three months of training, the 200 pairs of scores shown 
in the accompanying table were obtained by testing 200 randomly selected 
applicants and later measuring their productivity. _ 

a. On the basis of criteria available to the student, is one justified m con¬ 
sidering r a measure of the relationship? 

b . Compute r. 

c. Find the linear mean regression of 7 on I. 

d . Find the linear mean regression of X on F. 

e. If a new applicant has an aptitude score of 43, what would you predict his 
productivity would be after three months of training? 

/. On the basis of criteria available to the student, is one justified m testing 
hypotheses about p? Make the computations necessary to justify your 
answer. 

g . Test the hypothesis that p = 0. 

h . Find the .05 confidence interval for p. 

t. What would you expect the mean error to be in predicting a large num¬ 
ber of productivities? Within what interval would about 50 per cent of the 
errors lie? Would the error tend to vary systematically with x? 
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Applicant 

Aptitude 
score (X) 

Produc¬ 
tivity (F) 

Applicant 

Aptitude 
score ( X) 


1 

9 

23 

51 

37 


2 

17 

35 

52 

36 


3 

20 

29 

53 

33 


4 

19 

33 

54 

36 


5 

22 

39 

55 

35 


6 

20 

43 

56 

37 


7 

23 

32 

57 

34 


8 

26 

36 

58 

33 


9 

25 

39 

59 

36 


10 

24 

43 

60 

35 


11 

23 

46 

61 

35 


12 

27 

45 

62 

33 


13 

25 

44 

63 

35 


14 

25 

51 

64 

37 


15 

23 

49 

65 

41 


16 

27 

56 

66 

40 


17 

31 

25 

67 

40 


18 

30 

41 

68 

38 


19 

32 

40 

69 

39 


20 

28 

42 

70 

42 


21 

31 

39 

71 

40 


22 

30 

41 

72 

41 


23 

29 

46 

73 

39 


24 

30 

47 

74 

42 


25 

32 

47 

75 

40 


26 

31 

45 

76 

38 


27 

32 

47 

77 

41 


28 

28 

43 

78 

40 


29 

29 

45 

79 

42 


30 

31 

51 

80 

39 


31 

30 

51 

81 

41 


32 

32 

50 

82 

39 


33 

31 

54 

83 

40 


34 

29 

56 

84 

42 


35 

30 

57 

85 

42 


36 

33 

36 

86 

40 


37 

35 

41 

87 

41 


38 

35 

40 

88 

39 


39 

34 

41 

89 

41 


40 

35 

39 

90 

40 


41 

37 

44 

91 

40 


42 

36 

47 

92 

38 


43 

37 

47 

93 

41 


44 

33 

45 

94 

42 


45 

35 

43 

95 

39 


46 

37 

45 

96 

40 


47 

36 

46 

97 

41 


48 

34 

46 

98 

41 


49 

36 

51 

99 

39 


50 

35 

49 

100 

42 



Produc¬ 
tivity (F) 


50 

50 
52 

51 
49 

51 

54 

55 

55 
57 

56 
59 
62 
71 

41 

42 

45 

46 

44 

45 

52 

49 
51 

50 

50 

51 

48 

52 

50 

51 

51 

52 

49 

49 
51 

50 
55 

53 

55 

56 

55 

53 

56 
55 

54 

54 

55 
61 
59 
58 
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Applicant 


101 

102 

103 

104 

105 

106 

107 

108 

109 

110 
111 
112 

113 

114 

115 

116 

117 

118 

119 

120 
121 
122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146 

147 

148 

149 

150 


Aptitude 
score (X) 


40 

45 
44 

46 

46 

44 

45 
45 

47 
45 

44 
43 

43 

45 

44 

46 

45 

47 

45 

46 

45 

43 

46 
45 

47 

45 
47 
47 

46 

44 

47 

43 
46 

46 

45 

47 

44 

46 

45 
45 
43 

50 

49 

51 

51 

48 

50 
50 

52 
50 


Produc¬ 
tivity (7) 


70 

36 

46 

45 

45 

44 

51 

51 

49 

52 

50 

51 
48 
50 

53 
55 

54 

54 

55 

57 
55 

53 

54 
53 

55 

53 

56 

56 
55 

54 

59 
62 

60 

59 
61 

58 

60 

63 
66 

64 
64 
41 
45 
44 
51 
49 
49 

54 

57 

55 


Applicant 


151 

152 

153 

154 

155 

156 

157 

158 

159 

160 
161 
162 

163 

164 

165 

166 

167 

168 

169 

170 

171 

172 

173 

174 

175 

176 

177 

178 

179 

180 
181 
182 

183 

184 

185 

186 

187 

188 

189 

190 

191 

192 

193 

194 

195 

196 

197 

198 

199 

200 


Aptitude 
score ( X ) 


49 

51 

51 

49 

52 

50 
48 

48 

50 

49 

51 

48 

52 

50 

50 

51 

49 
55 
57 
54 

54 

53 

55 

56 

55 

53 

57 

54 

54 

56 

55 
54 
53 

56 
60 

58 
61 
60 

59 

59 

60 
62 

67 

64 

65 
64 
71 

68 
74 
73 


Produc¬ 
tivity (7) 


59 
61 
62 

60 
60 

59 
58 

60 
58 
61 
66 
65 

64 

63 

58 
71 
68 
44 
51 

48 

57 
55 

59 
61 

58 

59 

60 
60 

65 

64 

64 
71 
68 
73 

49 
57 
60 

65 
64 

64 

70 
73 

66 
69 
59 

71 

65 
73 
67 
78 
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j. What would you expect the mean error to be in predicting a large num¬ 
ber of productivities if the X scores were not available [but E{Y) were known]? 
What would the standard deviation of the errors be? 

k. What is the probability that a predicted productivity, using the regres¬ 
sion equation, will have an error of +10, that is, be 10 points higher than the 
actual productivity? State clearly the assumptions involved. 

l. Discuss the problem of random sampling in this case. 

12.7.2. A clinician is interested in whether three measures of neuroticism, 
X, Y, and Z, are interrelated. He secures the three measures on each of 
50 patients and thus obtains 150 pairs of measures, there being 3 for each 
patient: (X,F), (X,Z), and (F,Z). He then computes r for these 150 pairs 
and finds that he can reject the hypothesis that p = 0. Why is his test 
invalid? 

12.7.3. If the standard error of estimating F from X is the same regardless 
of the value of X and if p + 0 and the mean regression of F on X is linear, how 
would p of a subpopulation with a smaller range of X compare with p for the 
entire population? For example, if the mean regression of academic aver¬ 
ages on intelligence-test scores is linear and if p ^ 0 and the standard error 
of estimate is constant, would you expect |p| to be higher for a very hetero¬ 
geneous group of students or a relatively homogeneous group of students with 
respect to intelligence-test scores? 

12.7.4. The “reliability” of a measure is sometimes defined by obtaining 
the same measure (that is, using the same or equivalent measuring procedures) 
at two different times on each member of a large group and then finding r for 
the paired measures. If the second measure tends to be larger than the 
first measure by a constant amount, will this discrepancy have any effect 
on the size of r? If r = 1, what has remained constant in the two sets of 
measurements? 

12.7.5. A rather famous argument in psychology arose when investigations 
showed that under certain conditions the reliabilities of maze performance 
scores were quite low. Psychologist A maintained that these low reliabilities 
cast doubt on all results which had been reported using maze scores. Psy¬ 
chologist B maintained that certain results, which had been shown to be sta¬ 
tistically significant, were valid from a statistical point of view despite the 
possibly low reliabilities of the measures used. Who was right? (Hint: Con¬ 
sider the case of having been able to reject the hypothesis that two sets of 
measures were drawn randomly from populations having the same mean.) 

12.7.6. Try to derive a method of finding confidence intervals for p t — p 2} 
on the basis of the theorem and corollaries in Sec. 12.5. What is the difficulty? 



Chapter 13 

F DISTRIBUTIONS AND THE 
ANALYSIS OF VARIANCE 


13.1. Definition of F. Let X be distributed as xi and Y be 

. a , X/m 

independently distributed as xl > then F m , n is denned as y^ * 

In other words, Fm,n is the ratio of two independent variates 
each distributed as chi square, each divided by its number of 
degrees of freedom. Note that F m , n is always positive. 

13.2. Distribution of F. We state without proof 1 that the dis¬ 
tribution of F m ,n is given by 



A typical F density function is illustrated in Fig. 13.2.1. 


y 



Fig. 13.2.1. 

1 A proof is given in the Appendix, Sec. B.24. 
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Let F i and F 2 be constants such that Fi < 1 < F 2 . Note that 
F m , n — 1 /F„' m ', therefore, if F m<n <Fi, F n%m > \/F\ and vice 
versa. Thus prob (F mi „ < Fi) = prob (F n<m > 1 /Fi). There¬ 
fore, 

prob (F w ,» < F{) or (F m , n > F 2 ) 

= prob (V n , m > ^) + prob (F m , n > F 2 ) 

Tables of F p such that 

Jq f(Fm,n ) dF m , n = 1 — p 

for 1 — p = .90, .95, .975, .99, .995, and for various values of m 
and n are given in the Appendix; thus the table of F v gives the 
values l/F i and F 2 for the probabilities usually wanted. Note 
that for small p, both F 2 and 1/Fi are greater than 1. It is con¬ 
venient, therefore, to compute F m , n or F n>m , whichever is greater 
than 1, and then see whether the computed F is greater than 
Fp, using the same p in either case. When this procedure is 
used, any hypothesis rejected because F is greater than F v must 
be rejected at the 2 p level of confidence, as a two-tailed test has 
been used. By deciding beforehand which F to compute, a one- 
tailed test can be made. 

13.3. The Ratio of Unbiased Estimates of Two Population 
Variances. Let two random samples of sizes n\ and n 2 respec¬ 
tively be drawn from N(p, h crl) and N(p, 2 ,cri). By Theorem 
10.6 the two statistics nisl/al and n 2 s 2 /a\ are independently 
distributed as x»,-i and x»,-i respectively. Therefore, by 
definition 

nis\/<r\ 

~ 1 = v 

n 2 s 2 /<j\ mi— i,» 2 —i 

«2 — 1 

Note that this ratio can be written ~ * 2 £?. Ag we 

^2^2/ (^2 1) <71 

have stated earlier, 1 ns 2 /(» - 1) is an unbiased estimate of <r 2 . 

1 A proof is given in the Appendix, Sec. B.20. 
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Thus the ratio of two independent unbiased estimates of popula¬ 
tion variances, multiplied by the inverse ratio of the variances 
which have been estimated, is distributed as 

13.4. The Special Case erf = <r\- In the very important case 
that the two population variances are equal, the ratio of the 
unbiased variance estimates alone (multiplied by one) is dis¬ 
tributed as F ni _This case arises when, for example, two 
independent samples are drawn from the same population. 

EXERCISES 

13.4.1. A sample of 17 drawn at random from A(0,14) yields a standard 
deviation of 5. A sample of 10 drawn at random from A (5,10) yields a 
standard deviation of 4. 

a. Compute F. 

b. Is this value reasonable? 

c. Would there be any point in computing F under these conditions. 

13.4.2. A sample of 10 drawn at random from A(0,6) yields a standard 
deviation of 3. A sample of 10 drawn at random from A (4,10) yields a stand¬ 
ard deviation of 2. Compute F. How would you interpret this value 

Off? . . . p 

13.4.3. A sample of 25 drawn randomly from N(}x h a i) has a variance of 40. 

A sample of 49 drawn randomly from A(jus A) has a variance of 15. Test the 

hypothesis that 
2 2 

a. <Ti = a 2 

b. 2a\ = a\ 

c. a\ < a\ 

13.4.4. In the use of certain precision instruments, the mean error can 
always be made zero by proper calibration; therefore, the critical measure of 
the adequacy of such an instrument is the variability of the errors made. In 
a comparison of two different instruments, 25 measurements were taken with 
instrument A and 35 with instrument B (all by the same operator) and the 
error of each measurement was found by a more accurate and precise (but 
also more laborious) procedure. The errors of instrument A had a variance 
of 30; those of B had a variance of 20. Assuming the population of errors to 
be normally distributed, test the hypothesis that 

a . the variances of errors of A and B are equal. 

b . the variance of A is twice that of B . 

c. the variance of A is no more than that of B. . . . 

What does the assumption of random sampling mean in this application. 

What are some of the experimental procedures that would violate this assump¬ 
tion? Exactly what populations have been sampled and what, from a statis¬ 
tical point of view, should be the scope of the conclusions drawn? 
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13 j 4 ' 5 ' j TW 0 methods of manufacturing a certain product have been pro- 
posed and set up experimentally. How can the hypothesis that the two 
methods result m equal variabilities in the product be tested ? List the assump- 
tions involved in making the test. 

13.5. The Special Case M i = tn; <r? = oi When not only the 
population variances but also the population means are equal 
(and the populations are normal), three independent estimates of 
the population variance can be obtained from two samples of the 
same size n. Two of these are s\n/(n - 1) and sfn/(n - 1), as 
before. The third is obtained from Theorem 9.5.2, that the 
mean m of a sample of size n drawn from N(n, <r 2 ) will be nor¬ 
mally distributed with mean n and variance <j 2 /n. Thus nh and 
m ' 2 ’ ^' e means °f the two samples, can be considered a sample of 
two from N(n, a 2 /n). Thus an unbiased estimate of a 2 /n is the 
variance of the two sample means si, times 2/(2 — 1) = 2. 
Thus an unbiased estimate of a 2 is 2ns*. We state without 
proof that this estimate is independent of the two others, that is, 

that / (2 ns 2 m , si = f(2ns 2 jf (s\ etc. Further¬ 

more, the ratio of 2 nsl to either s 2 - J?L_ or s | _ n ig . 

tnbuted as This follows from Theorem 10.6, for by that 

theorem ^ is distributed as X i; therefore, 

2s* 

<r 2 /n _ 2 nsl 

sWln - 1) sjn/(n - f) 

rr2 


is distributed as F 1>n _i. 

Since in this special case a f = <r|, a better estimate of a 2 can be 
obtained by combining s\n/{n - 1) an d sfn/(rc - 1) into a 
single estimate, which is the arithmetic mean of the two, 

(si + S2)n/2(n — 1). It can be shown that- ^ nSm __ 

. ,. , , , _ (»i + s\)n/2(n - 1) 

is distributed as F li2n _ 2 . 

1 Proofs are given by Cramer (3) and Wilks (12). 
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Note that 


2nsL = 2 n 


(«, - ra+my + («, - <m±smy 


n(mi — m 2 ) 2 
' 2 


Therefore 

2ns! 


(mi — m 2 y- 


(sf + sl)n/2(n - 1) («i + s 2 2 )/(n - 1) 

_ ( mi — m 2 \ 2 _ 

~ W?s! + si)/(n - 1)/ 


(See Chap. 11.) Thus, in this special case, F = I 2 . As a matter 
of fact, it follows from the definitions of t and F that t n is dis¬ 
tributed as F i,n* This relation is indicated in the tables in 
Appendix D, as the entry for F i,„ for a given confidence level 
(one-tailed test) is the square of t n for that same confidence level 
(two-tailed test). 

If we wish to use F to test the hypothesis that two populations 
have equal means by drawing samples of size n, we can decide 
beforehand to compute F x , n instead of F n ,u and thus use a one- 
tailed test. The reasonableness of this becomes apparent when 
we consider that a significantly large value of F n ,i would be 
difficult to interpret; about the only interpretation would be 
that some error in procedure was entering in to tend to make the 
difference between the two means smaller than it would ordi¬ 
narily be under conditions of random sampling. 


EXERCISES 

13 . 5 . 1 . A sample of 9 drawn from N (0,<r 2 ) yields a mean of 1 and a variance 
of 8." Another sample of 9 drawn from the same population yields a mean of 3 
and a variance of 7. 

a. Compute F, using as numerator the estimate based on the two means. 

b. How would you interpret this value of F I 

c. Compute t and compare this value with that for F. 

d. If F had been “significantly” large, what conclusion, if any, would you 
have drawn? 
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Jf'ofso 1 T»“ftT tr0 “i NM ">«“ » f 5 "<1 * varj- 

variance of 25. Test the hypothli, Ik"! ,leld ’ “ ™“ of 8 “ d * 


O'! 


cr|. 


b. Mi - M 2 . Does your test presuppose that <r* = <r|? If so does vnur 

first test give you confidence that this is true? ’ y 

c. mi = M 2 , using £. Compare with F. 

„ i 3 :t 3 > A + C ° ntl '° VerSy ar0Se as t0 whether tw o different sets of skulls found 
a short distance apart, came from the same prehistoric racial stock The 
ephahc index of each skull was found. Outline the steps and assumptions 
involved m comparing the mean cephalic indices. P 

13.6. The Case of Several Groups. Let a sample of size r be 
drawn at random from each of k populations, each having the 
ame normal distribution. Then one estimate of a 2 


is 


K 

l 


°i - 17 

r — 1 


k 

the mean of the k estimates; another estimate is obtained from 

the variance of the means, si, and is r—-^ «* (since — s 2 i s 

lc — 1 \ __ j m Ai5 

an unbiased estimate of <r*/ r ). We state without proofs that 
these two estimates are independent and that their ratio 

rk 


k - 1 


2 


r — 1 


k 

is distributed as F k _ 1<rk _ k . 

When F I l<r , k i s computed from k samples from k different 
normally distributed populations and is significantly large we 
can reject the hypothesis that the populations all have the same 
mean and variance. Ordinarily we are more interested in differ¬ 
ences between means than in differences between variances. To 
conclude from a significant value of F k _ 1<rk _ k that the population 
means are different implies the assumption that the population 

1 Proofs are given by Cramer (3) and Wilks (12). 
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variances are equal. This is the assumption of homogeneity of 
variance. We can test for homogeneity by making use of the 

following theorem: . . , , 

Theorem 13.6. Let $f, $1, . . ■ , $1 represent independent 

unbiased estimates of the population variance derived from lc 
samples, each of size r, drawn at random from normal popula- 
tions having the same variance. Let 


& 

k 

k 


and 


D = 


3 kr 


2>kr + k + 1 


is 


distributed 


k 

Then H = 2.3026 Dr (k logxo s 2 - Y logic Sf) 

X i = 1 

approximately as xl- 1 - . , , , $ » \ 

When H is significantly large (as given by tables ofX*-i) the 

hypothesis of homogeneity of variance must be rejected. W en 
H is small, the hypothesis has not, of course, been prove , u 
somewhat greater confidence can be placed in the conclusion m 
case a significant F is obtained, that the population means di er 
that is, somewhat greater confidence than m the absence of the 

If we are willing to assume normality and homogeneity of 
variance and we have obtained a significant value of F, then we 
can reject the hypothesis that the population means are equal. 
This does not mean, however, that we can conclude 
means are different; it may be that the difference between a 
single pair of means is responsible for the large value of F. It 
has been customary, after finding a value of F o 

test the difference between any two means by the t test, btrict y 
speaking, this procedure is invalid, as some of the probabilities 
given in the t tables will be spuriously small, because more than 
one comparison is involved and the comparisons are mterdepend- 
ent in a way that makes the exact computation of probabilities 
for t, following F, a hopelessly complicated and labonous pro- 
cedure. For practical purposes, it is probably safe to art some 
significance level for F and the same significance level for t, then, 
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if F is significant at the level set, compute I’s and interpret them 
simply as to whether they do or do not reach the level set. In 
other words, if the .05 level has been set, a value of 3.2 for t with 
15 degrees of freedom should be interpreted as significant at the 
.05 level, not the .01 level indicated in the t tables. This pro¬ 
cedure can by no means be rigorously justified, and if an impor¬ 
tant decision must be made on the basis of a statistical test a 
wise procedure is to gather additional data for any comparison 
which has yielded a significant value of t, following F. 

In making the t test after F, a somewhat more powerful test 
can be made by using the assumption of homogeneity and taking 

f _ _ m,! — m 2 


Vaf 


.4 k(r - 1) 
with rk - k degrees of freedom instead of 


t = 


mi 


m 2 


^(si -Ts|)7(r - 1) 


with 2r — 2 degrees of freedom. 

. W j, ien F haS a Value which does not m eet our requirement of 
significance, it will ordinarily be inappropriate to make any t 

tests. However, when it has been decided beforehand that a 
particular comparison is of special importance, then that com¬ 
parison, y means of t, should be made. If more than one such 
comparison is to be made, the same problem of spuriously low 
evels of confidence is encountered as that discussed previously 
and the same caution should be exercised. 

^ P K OC r lUre 7 h i Ch aU ° WS ° ne to com P are ^dividual means 
and which avoids the pitfalls of t after F has been given by John 

W. iukey. In outlining the procedure, the justification of 
which is far beyond the scope of this book, we shall take as an 

b. 41 1 ” i rU . kt ' y ' mimtluKl menus in the analyst, of variance 

zzsxzj*" di ””* “—- 
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example the following hypothetical data: 


Population sampled 
(sample size 12) 

Sample 

mean 

Sample 

variance 

A 

22 

15 

B 

31 

13 

C 

18 

16 

D 

26 

14 

E 

23 

17 

F 

32 

16 

G 

22 

12 


Step 1. Choose a level of significance. 

For our example we choose .05. 

Step 2. Calculate the difference which would have been 

significant if there were but two samples. . , 

Making use of the assumption of homogeneity of variance, the 
estimated standard de viation (of the s ampling distribution) of a 


single mean <r v 


is 


it 


sj/kir 


1) = VI63770i) = 1.1566. 


We S et t[= (mi - m 2 )/V2*1] equal to 1.99, which is significant 
at the .05 level for 77 degrees of freedom, and solve for m x m 2 , 

0ht Step ? 3 Arrange the means in order and consider any gap 
larger than the value found in Step 2 as a group boundary. 

The seven means in order are 18, 22, 22, 23, 26, 31, an , an 
the differences 22 - 18 = 4 and 31 - 26 = 5 exceed 3.25, so 
that we have divided the populations sampled mto three groups. 
18 (C) by itself; 22 (A), 22 (G), 23 (A), and 26 (D) together, and 

— 6 L re t ha» two nteans, the ptoceee 

teV Stlp & A S 'In each group of 3 or more means find the grand 
mean M, the most straggling mean m (that is the mean differing 
most from the mean of the means in its group), and the difference 
of these two divided by Convert th s ratio into a statistic 

having approximately the distribution iV(0,l) by taking 



F DISTRIBUTIONS AND THE ANALYSIS OF VARIANCE 


189 


( k' > 3 means in the group) 


(3 means in the group) 

in which n is the number of degrees of freedom on which & m is 
based. Separate off any straggling mean for which this ratio is 
significant at the chosen significance level using tables of 

N( 0,1). 

In our example, we have only one group of 3 or more means, 
the group 22 (A), 22 ((?), 23 (E), and 26 (D). The mean of 
these 4 means is 23.25, and the most straggling mean is 26. 
Thus (to - M)/& m = (26 - 23.25)/1.1566 = 2.37766. Further, 
logio 4 = .60206. The ratio is thus 

2.37766 - {%) (.60206) _ 0 1ft 

3(H + Ht) 

As the .05 level for iV(0,l) is 1.96, we separate 26 (D) from the 
other members of the group. 

Step 5. If Step 4 changed any group, repeat the process until 
no further means are separated in the old groups. The means 
separated off from one side of a group form a subgroup. If 
there are any subgroups of 3 or more when no more means are 
being separated from groups, apply the same process (Steps 4 
and 5) to the subgroups. 

In our example our old group has been reduced to 3 means, 
and of these the most straggling mean is 23, which differs from 
the mean 22.33 by only .67, yielding a ratio 

(.67/1.1566) - (%) log 10 3 = m 
.78896 

which of course is not significant. 

Step 6. Calculate the sum of squares of deviations from the 
group mean and the corresponding mean square for each group 
or subgroup of 3 or more resulting from Steps 4 and 5. Using 


to — M 6 


logio k! 


3(1/4 + l/n) 


or 


to — M 


1 

2 


3(1/4 + l/n) 
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as the denominator, calculate the variance ratios and apply 
the F test. 

We have one group of 3 or more, and calculating F for the 
means, using a 2 m = 1.34 as denominator (77 degrees of freedom) 
and the variance of the 3 means, times as numerator (3 
degrees of freedom), we obtain F = Chs7)/1.34, which is less 
than 1 and therefore not significantly large (nor is the reciprocal, 
4.52, significantly large for 77 and 3 degrees of freedom respec¬ 
tively). (Note that both numerator and denominator are, by 
hypothesis, estimates of the variance of the sampling distribu¬ 
tion of the mean. To obtain estimates of the variance of the 
population we would multiply numerator and denominator by 
12, leaving the ratio and the numbers of degrees of freedom 
unchanged.) 

The procedure described allows us to make the following 
assertions, each at the .05 level of confidence: 

Population C has the lowest mean of the populations sampled. 

Each of the populations A, D, E, and G has a mean which is 
lower than the mean of B and also lower than the mean of F. 

Population D has a higher mean than at least one of the 
populations A , G , and E. 


EXERCISES 

13 . 6 . 1 . Six independent random samples of size 10 are drawn from a nor¬ 
mally distributed population. The means and variances of the six samples 
are shown in the following table: 


Sample 

Mean 

Variance 

1 

100 

9 

2 

98 

7 

3 

102 

8 

4 

101 

8 

5 

99 

9 

6 

101 

8 


a. Compute F. 

b. Is this value reasonable? 

c. If F had turned out to be “significantly” large, what would your con¬ 
clusion be? 
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13 . 6 . 2 . A teacher divided a class of 60 students at random into 4 groups of 
15 each and gave each group a different objective examination (testing all 
groups simultaneously). The means and variances of the four groups were 
as follows: 


Group 

Mean 

Variance 

1 

74 

30 

2 

80 

25 

3 

78 

33 

4 

70 

28 


Is there any evidence that the four tests differ in difficulty? Define the 
populations sampled as precisely as you can. Discuss the problem of random 
sampling. Why was it important to test the students simultaneously? 

13 . 6 . 3 . Eight stocks were selected by each of 5 methods and the increase 
(or decrease) over a period of 6 months was noted for each stock. The means 
and variances of the increases for each method are shown in the following 
table: s 


Method 

Mean 

Variance 

A 

0 

4 

B 

4 

15 

C 

-3 

6 

D 

1 

7 

E 

0 

1 


Is there any evidence that the 5 methods differ in the mean increase yielded ? 
Define the populations sampled as precisely as you can and discuss the problem 
of random sampling. 


13.7. The Partition of a Sample into k Groups. Let us now 

approach F in a way that is superficially different from (but 
actually equivalent to) that in the preceding chapters. Suppose 
that we draw a sample of n at random from a normally dis¬ 
tributed population, and that we have chosen n so that it is 
divisible by the integer k, that is, n = rk, where r is some 
integer. We can decide beforehand to divide the sample arbi¬ 
trarily into k groups of r each, thus: 
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1 

2 

• • • 

k 

Xn 

x 12 


Xu 

x 21 

x 22 


x 24 

Xn 

x r2 


X rk 

shown that 

the 

total 


deviations from the mean m of the total sample, which is 
^ ^ ( x ., _ m ) 2 ? can be split into two parts, S h and S w , such 

that S b is the sum of squares of deviations of the group means, 
m.y, from the mean m, weighted by a factor r, as there are v 


members in each group, that is, ^ r(m.j m) 2 , and S w is the 

3 = 1 

sum of squares of deviations of the z’a from their respective 
group means, that is, 


X (x n - m.iY + £ (x i2 - to . 2 ) 2 + ' • • 

i = 1 i==l r k r 

+ t (*« - = X X (Xij ~ m - j)2 

;=i j=ii=i 

The subscript & indicates that the sum of squares arises from 
variation between groups, w indicates that it arises from varia¬ 
tion within groups. In other words, we have the following: 

S 8b 8w 

k r & k r 

V t (Xij - m ) 2 = t r (m.,- - m ) 2 + J X (x£i “ m > )2 

/ _ i 7 = 1 i = 1 

3 = 1 7 = 1 J 1 ^ 

A proof of this is as follows: 
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But the middle term, when expanded, gives 

2 (ll x « m i ~ H mx '> + XX mm d 

= 2 ^ m.y ^ Xij — ^ rm?,' — rn ^ ^ a;^- + m ^ to.,^ 

y * y 

= 2 (J^ m.jrm.j — ^ rm 2 — mrkm + mrkm'j = 0 

3 3 

Further, ^ ^ (m.y — m) 2 = ^ r(m.y — m) 2 . Thus our theorem 

3 

is proved. 

Corresponding to this division of S into S b and S w is a division 
of the degrees of freedom. There are n — 1 degrees of freedom 
in all (the size of the sample minus the number of constants com¬ 
puted from the sample and used in the computation of the sum 
of squares, there being only one such constant, the mean). The 
number of degrees of freedom corresponding to S w is n — k (the 
number of deviations n minus the number of computed con¬ 
stants used, the k group means), and that corresponding to S b is 
k — 1 (the number of deviations minus the number of com¬ 
puted constants used, the one general mean). Note that 
n — 1 = (n — k) + (k — 1). 

By dividing each sum of squares by its corresponding number 
of degrees of freedom, we obtain three estimates of the popula¬ 
tion variance o' 2 . Thus we have the following table: 



Sum of squares 

d.f. 

Estimate of Yar 

Between groups. 

k 

&-r £ (m.j — m) 2 
y=i 

k - 1 

S b /(k - 1) 

Within groups. 

k r 

Sw — (&ij Tfl.j) 2 

y = 1 i =i 

n — k 

S w /(n - k) 


k r 



Total. 

8-H ^ Xii ~ 

y =1 i =i 

n — 1 

S/(n - 1) 
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We state without proof 1 that 8b/(k — 1) and S w /(n — k) are 
independent estimates [though neither is independent of 
8/(n — 1)], that is 

f 8 1 /; _ p 8}) ^ 8 m 

1 L(A - 1)’ (n - k) J ~ J _(k - 1)J- / (n - k)_ 
and that is distributed as F k -i, n -k- 

Sw / {n A/) 

The reader may already have realized that to partition arbi¬ 
trarily a sample of n, drawn from N (/x,<r 2 ) , into k groups of r each 
is equivalent to drawing k samples of r each from that same 
population. In fact, as far as the sample values are concerned, it 
is equivalent to drawing a sample of size r from each of k popula¬ 
tions, each normally distributed with the same mean and vari¬ 
ance. This, however, is the case discussed in Sec. 13.6; therefore 
Sb/(k — 1) should be equal to rks 2 m /{k — 1) and S w /(n — k ) 

k 

l* r 

should be equal to 3 —) - z • This follows from the fact that 

k 

h £ O j - m ) 2 

Sb = t ^ ( m.j — m) 2 = rk — -^- = rks 2 m 

and 

r 

k ^ _ W j) 2 k 

s «, = (*« - m -i) 2 = 2 r ~- r - = r 2 ^ 

i = l i = 1 

The reader may wonder why we have introduced the notion of 
partitioning a sample; the reason will be apparent in the remain¬ 
ing section. 

13.8. The Double Partition of a Sample. Suppose that, as in 
Sec. 13.7, we draw a sample of n(= rk) from N(n,(r 2 ). Instead 
of a single partition as before, we now divide our sample not only 
into k groups of r each, but also into r groups of k each: 

1 Proofs are given by Cram6r (3) and Wilks (12). 
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1 

2 

, ■ . k 

Ill 

X12 

... X* 

Xu 

X 2 2 

... 

Xrl 

X r 2 

' • X 


Whereas in our table in Sec. 13.7 the first subscript merely 
numbered the members of a given column, it now indicates 
membership in a certain row. 

It can be shown (the student should show this as an exercise) 
that the total sum of squares of deviations from the mean m, 
which as before is 2 2 («*_,• — m) 2 , can be split into three parts, as 
follows: 

S S c S r 

2 ^ (Xij - m) 2 = r £ (m.j - m) 2 + k^ (rm. - m) 2 
i % 

Sc-r 

+ Y, ^ ~ m ‘- ~ m i + m Y 

The number of degrees of freedom associated with S cr is 
rk — r — k — 1 or (r — l)(k — 1), which is the product of the 
degrees of freedom for rows and for columns respectively. We 
obtain four estimates of <r 2 , and we state without proof 1 that 
S c /(k 1), S r / (r 1), and S c -r/{k l)(r — 1) are independent 
of each other. These estimates are shown in the following table: 



Sum of squares 

d.f. 

Estimate of a 2 

Columns. 

S c = r ^ (m.j — m ) 2 

k — 1 

Sc/{k - 1) 

Rows. 

S r = (mi. — m ) 2 

r — 1 

S r /(r - 1) 

Columns X 
rows interac¬ 
tion 

i 

S c . r = 2) 2 (#*•,■ — m^ 

— m.j + m ) 2 

(k - l)(r - 1) 

S c . r /(k — l)(r — 1) 

Total. 

S — 'E'E(xij — m ) 2 

rk — 1 

S/(rk - 1) 


1 Proofs are given by Cramer (3) and Wilks (12). 
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Let us consider the columns X rows interaction. If the varia¬ 
tion between rows, from column to column, were identical, this 
interaction sum of squares would be zero. For example, the 
following table gives a zero interaction sum of squares: 


1 

2 

CO 

4 

3 

1 

6 

5 

3 

^ 2 

1 

-1 

3 

2 

0 


The difference between any two rows is constant from column 
to column; similarly, the difference between any two columns is 
constant from row to row. If this were not so, there would be a 
nonzero interaction sum of squares. The interaction, therefore, 
arises from the variation in row differences from column to 
column, or in column differences from row to row. In trying 
to understand the interaction sum of squares it is helpful to 
consider x i} - as the general mean m plus a deviation dij, to 
consider nii. as m plus a deviation d{., and to consider m.j as 
m plus a deviation d.j. Then — m;. — m.j m) 2 

becomes 22(d# - d { . - d./) 2 and we see that the interaction 
sum of squares is the sum of squares which would remain if we 
first subtracted the general mean from each entry in the table 
and also from each of the column and row means and then sub¬ 
tracted from each entry (which now would be a deviation from 
the general mean) the deviations of its column and row means. 

The interaction sum of squares can also be written 22(#y 
_ d . — m) 2 , which shows that it is the sum of squares of 
deviations from m which would remain if we “corrected” each 
Xij by subtracting from it its column mean’s deviation from m 
and its row mean’s deviation from m. 

We state without proof 1 that the ratio of any pair of our three 


1 Proofs are given by Cram6r (3) and Wilks (12). 
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independent estimates of <r 2 is distributed as F ah , in which a is 
the number of degrees of freedom for the estimate in the numer¬ 
ator and b is the number of degrees of freedom for the estimate in 
the denominator. 

Obviously there is no point in arbitrarily partitioning a sample 
and then making F tests; significant results would only make us 
suspicious of the randomness of our partitions. Applications 
arise when partitions are determined in advance by the design of 
an experiment. For example, suppose we want to compare an 
intellectual performance under normal conditions with that 
under the influence of a drug. If we plan to use 16 subjects in 
the experiment, there is an obvious and predetermined way 
of doubly partitioning the 32 observations, into drugs (two 
columns) and subjects (16 rows). 1 We then obtain the following 

ljS/Dl.0 l 



d.f. 

Estimate of <r 2 

Columns (drug vs. no drug). 

1 

15 

Sc 

O /I K 

Rows (subjects). 

Interaction. . . . 

Or/ i-O 

Scr /15 


1 o 


If the drug has no effect, if there are no differences between 
subjects (except chance fluctuations), and if there is no system- 
atm experimental error, then the ratio of any two of the estimates 
should be distributed as F. Note, however, that it is the first 
hypothesis that we are primarily interested in testing. The 
second hypothesis is one which we can be reasonably sure in 
advance will not hold, and we want our test of the first hypoth¬ 
esis to be valid regardless of whether there are differences 
between subjects or not. The interaction estimate is based on 
the variation within drug conditions after variation due to sub- 
jects has been taken out; therefore, this is the estimate which we 
should use as denominator in F. Suppose the following esti¬ 
mates were actually obtained: 


1 In this discussion we ignore the problem of order 
no drug, or vice versa). 


of conditions (drug and then 
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Sum of squares 

d.f. 

Estimate of c r 2 

rinlnmns . 

10.125 

1 

10.125 

Rows . 

346.875 

15 

23.125 

Interaction. 

19.875 

15 

1.325 

Total . 

376.875 

31 







The comparison in which we are most interested yields 
jfP i 15 = 10.125/1.325 = 7.64, which is significant at less than the 
.025 level (one-tailed test). 

The interaction estimate in the above example can be better 
understood if we consider an analysis of the same data by means 
of t. The following table shows the data by subjects, a differ¬ 
ence column, and the calculation of a paired t. 


Subject 

No drug 

Drug 

No drug — drug 

> 1 

56 

57 

-1 

2 

52 

50 

2 

3 

51 i 

53 

-2 

4 

46 

45 

1 

5 

53 

53 

0 

6 

47 

44 

3 

7 

52 

50 

2 

8 

49 

48 

1 

9 

50 

51 

-1 

10 

57 

55 

2 . 

11 

51 

50 

1 

12 

53 

49 

4 

13 

50 

47 

3 

14 

45 

45 

0 

15 

49 

47 

2 

16 

53 

52 

1 

Total 



18 


Mean difference = 1 He 
Variance of differences = 15 %4 

t = = *^6 = 2.764 p < .02 

•'y/s 2 '/15 a/159/ (64) (15) 
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The hypothesis we have tested by t is that the 16 differences 
are drawn at random from a normal population with zero mean. 
Note, however, that t 2 = 7.64 = F . Our paired t test is the 
exact equivalent of our F test using the interaction estimate as 
denominator. The interaction estimate, therefore, can be con¬ 
sidered as derived, just as the denominator of t, from variation 
in individuals in differences between drug and no drug. The 
student should answer for himself the following question: why, 
in this example, is a two-tailed t test the equivalent of a one- 
tailed F test? 

If we had had more than two drug conditions, we would have 
used the interaction estimate, as before, in the denominator of 
F. Caution, however, would need to be exercised in drawing 
any conclusion about the population of scores under different 
drug conditions from which our subjects* scores could be con¬ 
sidered a random sample, the reason being that with more than 
two columns the number of degrees of freedom associated with 
the interaction estimate (when the same subjects are used in 
each column) becomes spuriously large. This becomes obvious 
when we consider the case of two subjects and 20 conditions. 
The interaction estimate would be associated with 19 degrees of 
freedom, whereas only two subjects were actually used! A sig¬ 
nificant F would not allow us to draw an inference about people 
in general, although, depending upon the design of the experi¬ 
ment, we might be able to infer something about these two par¬ 
ticular subjects. 

Whenever an interaction estimate is used for the denominator 
of F, it is important that one of the variables be random (as in 
the preceding example, in which the subject variable is random, 
that is, subjects are randomly selected from some population); 
otherwise, F is difficult to interpret. For example, if instead of 
16 subjects we had had 16 psychological tests, using the same 
subject for each test, our F test would imply that we were con¬ 
sidering the 16 test differences as a random sample from this one 
subject’s population of test differences, an interpretation which 
would rarely be appropriate. If each difference were based upon 
a different test and a different subject, we would have con¬ 
founded a random variable with a nonrandom one, and although 
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a significant F would allow us to reject the hypothesis that 
differences based on the 16 tests (assigning subjects randomly to 
tests and equating numbers of subjects on different tests) are 
normally distributed with mean zero, we could not claim that 
differences for any one of the tests have a nonzero mean. 

A sample of size n can often be partitioned into equal groups 
in more than two ways; in fact, if n = pi'p™ • • • pl k , in which 
each pi is a prime integer (an integer greater than 1 which is not 
the product of any two integers other than itself and 1) and each 
is an integer, then the sample can be partitioned into equal 
groups in 2n t - ways. This multiple partitioning results in a 
complex breakdown of the sum of squares, with higher-order 
interactions, but we shall not pursue this topic further. The 
student is referred to the references by Edwards (14), Fisher 
(5,6), Lindquist (16,17), McNemar (18), Rider (19), Snedecor 
(20), and, on a more advanced level, Mood (9). 

There is one comment, however, which needs to be made about a common 
procedure called pooling. In some cases in which the logically correct denomi¬ 
nator of F (for a given comparison) is an estimate based on interaction, the 
comparison desired is preceded by a comparison of the interaction estimate 
with an appropriate higher-order interaction estimate (using the first inter¬ 
action estimate as numerator). If this latter comparison yields a nonsignifi¬ 
cant F, the two sums of squares are combined to yield a “more reliable” esti¬ 
mate (with degrees of freedom equal to the sum of the two numbers of degrees 
of freedom), and this “more reliable” estimate is used as denominator in 
making the desired comparison. Although pointed out by Wilks (12) and 
other mathematicians, it is sometimes overlooked in applications that, when 
a pooled estimate is made, the hypothesis tested is no longer the same; it is 
being assumed that the first interaction is zero. A nonsignificant F does not 
prove this assumption, any more than any other failure to reject a hypothe¬ 
sis thereby proves it. If the numbers of degrees of freedom are small, there 
could be an appreciable interaction in the population and yet a nonsignificant 
F from the sample. Although pooling is no doubt useful in certain applica¬ 
tions, it is very questionable in some psychological experimentation in which 
it has often been employed. Almost any statistically minded psychologist 
would disapprove of a paired t test made by pooling 20 differences on each of 
three subjects to obtain either 38 or 57 additional degrees of freedom (depend¬ 
ing upon whether the 19 degrees of freedom or trials were eliminated or not), 
justified by the argument that an F test failed to show individual differences 
(in the mean differences) among the three subjects, yet this procedure is mathe¬ 
matically identical with pooling if an analysis of variance were performed on 
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the same data. The fact that pooling is not found in the literature using 
paired t leads one to suspect that pooling in the analysis of variance has some¬ 
times been employed without full awareness of the assumptions involved. It 
is good to keep in mind always the population about which an inference is 
desired and to remember that one can never have more degrees of freedom 
than one has independent observations from this population. In the exaniple 
cited, we certainly do not have more than three independent observations 
from the population of differences of people from which the three subjects 
can be considered a random sample; if we are interested in drawing a conclu¬ 
sion about these three subjects themselves, we might under certain circum¬ 
stances be justified in considering our 60 differences as a random sample, but 
if we are interested in generalizing to a larger population of people, no amount 
of statistical juggling can give us more than 2 degrees of freedom. 


13.9, Computation of Sums of Squares. As an exercise the 
student should derive the following computational formulas for 
8, St, and S w , in which n = rk: 

nvi 

*• - s t* 2 (2-«)■ - (2 2 *«)■] 

3 % 


Thus for the case of a partition into k groups, three sums need 
to be computed: the sum of scores 2 Ea^-, the sum of squares of 

scores 22a;?-, and the sum of squares of group sums ^ ^ 2 . 

3 i 

The following computational formulas can be derived for the 
case of double partitionings: 


s -s[»n»*-GW] 

- s [* 2 C2 *«)* - (2 2 -«)’] 

3 i 

*-*['?(£**)*-G 2 *-)’] 

S c . r = S -S e - S r 
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Thus in the case of double partitioning there are two sums of 
squares of group sums (one for column sums and one for row 
sums) to be computed. 


EXERCISES 

13 . 9 . 1 . The following table gives the mean yield for each of 4 varieties of 
corn on each of 5 blocks of land, randomly selected from a large area. The 
means are all based upon comparable sets of small plots within each block. 


a 


b 


o 

£ c 
P3 

d 


e 


Variety 


1 

2 

3 

4 

7 

6 

6 

7 

10 

8 

7 

9 

6 

3 

5 

7 

4 

3 

3 

3 

8 

5 

5 

6 


a. Test the hypothesis that the varieties differ in mean yield only by random 
sampling. If you reject this hypothesis, try to state precisely the assump¬ 
tions, statistical and otherwise, that you are making. 

b. Test the hypothesis that the blocks differ in mean yield only by random 
sampling. If you reject this hypothesis, state your assumptions as precisely 
as possible. 

c . Test the hypothesis that the varieties differ only by random sampling, 
treating the data as 5 random samples, ignoring the classification by blocks. 
Why does this analysis give a result different from that obtained in a? 

d. If instead of blocks of land, randomly selected, we had methods of fertili¬ 
zation, with the same numerical table of data, why would the analysis you 
have made be inappropriate? 

13 . 9 . 2 . A large room is filled with thousands of small boxes. Each box 
contains five cards, in order, each card having a number written on it. In 
half the boxes the number on the first card is distributed approximately 
normally with mean 1 and variance 1; the number on the second card is dis¬ 
tributed approximately normally with mean 2 and variance 1, . . . , the num¬ 
ber on the fifth card is distributed approximately normally with mean 5 and 
variance 1. In the other half of the boxes, the means are reversed, that is, 
the number on the first card is normally distributed with mean 5 and variance 
1, etc. Explain as fully as you can why someone not knowing the distribu- 
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tions cannot test the hypothesis that the numbers on the five orders have the 
same mean by taking a random sample of five boxes and making an analysis 
of variance, using the following format: 


Order of card 

First Second Third Fourth Fifth 



Sum of squares 

d.f. 

Estimate of Var. 

F 

Orders. 

4 

0 

O/I 

Boxes. 

4 

B 

B/I 

Interaction. 

16 

1 



In particular, show that if this analysis were planned, the probability of 
rejecting the hypothesis of equal means (for orders) at the .01 level of confi¬ 
dence is greater than .01. 






Chapter 14 

NONPARAMETRIC STATISTICS 


14.1. Definition. The sampling distribution of a statistic usu¬ 
ally depends upon the form of the distribution of the population. 
For example, the sample mean will be distributed normally only 
if the population is normally distributed, although as the size of 
the sample increases the sampling distribution of the mean 
approaches normality as a limiting form. Some statistics have 
sampling distributions depending considerably upon the form of 
the population distribution even for large samples. In many 
applications, little is known about the population distributions 
and it is desirable to make as few assumptions about their forms 
and parameters as possible. Of particular interest, therefore, 
are statistics not involving any assumptions at all other than 
random sampling; methods employing these statistics are called 
nonparametric or distribution-free methods. In our discussion of 
finite populations we were using these methods, and certain uses 
of chi square which we have discussed also fall into the non¬ 
parametric category, though only for large samples. 

It is usually inefficient to use these methods when justifiable 
assumptions about the form and/or parameters of the popula¬ 
tion sampled can be made, as nonparametric methods are less 
powerful than tests utilizing assumptions when the assumptions 
are in fact correct. 

14.2. The Sign Test. One of the simplest nonparametric 
methods can be used in testing the hypothesis that equal propor¬ 
tions of a population lie above and below a certain point c. 
Consider drawing a random sample from such a population and 
discarding all members having the value c ; the probability that 
exactly r of the N remaining members will have values less than 

204 
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c is C r (14) N , and the probability that at most r members will 

r 

have values less than c is C"(H)*. For a discussion of how 

to use these probabilities in making tests of significance and 

and 11 ? C ° T nfi } enCe ; nt - val « the student is referred to Chaps. 3 
CTltlCa VdUeS ° f r ^ wo- tailed test) for 
1 90 and for confidence levels .01, .05, .10, and .25 are 

given in Appendix D. e 

Of particular interest is the special case in which the popula- 
tmn sampled is a population of pairs, the value of each pair being 
e.ther pl„ s orminus depending upon whieh is greaj (that is 
C - o hence the name “sign test”). This situation arises, W 
example, when two sets of observations are made at different 

oftbo? r d i e i. dlfferent ex P erimental conditions, each member 

second set with one and only one member of the 

second set, or when two groups of subjects are matched in pairs 

mi?s hen Tf S ?f 0imd ’ mdustrial Products, etc, are matched in 

m r beinl th S . “ & ™ mher other than 0 (the value of each 

be made th^t Sf®”?* ^ ^ tW ° } ’ the assum P tio » must 

be made that the values are quantitative, not merely ordinald 

14.3. The Run Test. In many applications a sequence of n 

observations is made, each observation resulting in one or the 

other of two values, a or b. If each of the observed events is 

Hdt’s 11 u heVS, u h ^ aU P ° SSible P ermu t a tions of the 

a s and 6 s are equally probable. If, on the other hand, there is 

a systema* 10 trend in the sequence, certain permutations will be 

g y improbable. The number of runs is one indication of the 

presence or absence of trend (or other lack of independence) in 

Mnn^r" 106 * a j U u bS a se 9 uence of like observations which 
cannot be extended by including an observation on either side 
For example, the sequence 

aabab agg bb gbbgbbg 

contains 11 runs, 6 being a runs (underlined) and 5 being b runs. 

a/ M°re c °mp le te discussions of the sign test are given by W J Dixon and A M 

4i°pp 55r56?t9T 6 1 -Td t b St ’lT <)W ” aZ American St ^tical Association,'voi 

? pp. do/ 5bb ? 1946; and by Dixon and Massey (13). 
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The probability (assuming independence) of obtaining r runs 
with 7 h a’s and n 2 b’s is given by a complicated expression which 
we do not reproduce here. 1 Too large or too small sizes of r 
enable one to reject the hypothesis of independence. Tables of 
significance are given in the Appendix for n u n 2 < 20, leve 
of confidence = .05 (two-tailed test). For n, and n 2 each 
greater than 20 the number of runs is approximately. nor¬ 
mally distributed with mean 2ni» 2 /(»i + »») + ! and variance 
2nin 2 (2n 1 n 2 — ni - n 2 )/(ni + n 2 ) 2 (n i + n 2 - 1). 

In making the run test the observations can be ordered into a 
sequence by means of any principle with respect to which a test 
of mutual independence of observations is desired. For 
example, in production processes it is important to watch for any 
change in the product with respect to time. Further, any 
principle of dichotomizing which is logically independent of the 
ordering principle can be chosen; for example, products can be 
dichotomized according to whether on a certain measure they 
exceed a certain amount, whether they exceed the median of the 

total sequence, etc. _ . 

14.4. Tolerance Limits. In many applications, especially m 
controlling the quality of an industrial product, it is useful to 
find two numbers, L x and L 2 , between which it can be asserted, 
at any desired level of confidence, that a given proportion of the 
population lies. There are various ways of choosing U and L 2 ; 
one particularly simple way is to take the lowest and highest 
values, respectively, in the sample. It can be shown 2 that if the 
population sampled has a continuous distribution, then no 
matter what its form or parameters, the probability density of 
2 , the proportion of the population lying between the extreme 
values of the sample, is given by 

f(z) = n(n - 1 ) 2"- 2 (1 - z) 

in which n is the size of the sample. Thus z and n alone deter¬ 
mine the density of 3. To find what sample size will enable us 
to assert, at the p level of confidence, that proportion P (at least) 
of the population lies between Li and L^ } the extreme values of 

1 This expression and its derivation are given by Hoel (8). 

2 A proof is given by Hoel (8). 
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the sample, we need to find the value of n satisfying the equation 

fpf(z) dz = 1 - p 

The integral is 



1 ) 2 «- 2(1 _ 


= n(n — 



The resulting equation, 


1 - + P»( n _ i) 


n pn 1 _ p„( n p 

is laborious to solve; a close approximation is given by 



xl 


1 +P 



in which Xp is the value of chi square for 4 degrees of freedom for 
which prob ( x 2 > xl) = p. 

For example, to find the size sample which we need in order to 
be confident, at the .01 level, that .95 of the population lies 
between the extreme values of the sample, we take 


n 


~ _ A / 2 


4 X.oi j 


1 + 


.95 1 

.95 + 2 


4 (13.3) -Qg- + g = 130.175 


The sample size required is therefore 130, the integer nearest to 
the solution. 

14.6. Order Statistics. The method of finding tolerance 
limits, described m Sec. 14.4, is an example of the use of what are 
nown as order statistics. Consider drawing a random sample 
of size n from any continuously distributed population; if the 
members of the sample are ranked according to size, they are 
order statistics. In the following discussion x x will denote the 
value of the smallest member of the sample, x % the value of the 
second smallest, . . . , x n the value of the largest. In Sec. 14,4 
an application was made of the known distribution of z, the 
proportion of the population lying between x n and x x . 
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It can be shown 1 that the expected proportion of the popula¬ 
tion smaller than an is i/(n + 1). Thus the expected propor¬ 
tion lying between an and an+i (two successively ordered observa- 

t : nTls ' ) i s 1±! _ —^r = —b-. In other words, the n order 
tions; n + l n 1 

statistics tend to divide the population into n + 1 equal parts, 
regardless of the distribution of the population. The two 
extreme values, Xi and x n , tend to include 

n + 1 _ 2 = n - 1 
n + 1 n + 1 n + l 


of the population. ., , 

The way in which the n order statistics tend to divide tne 

population is illustrated in Fig. 14.5.1. 


1 

(n+l) 

1 

(n+l) 

1 

(n+l) 

• • • 

1 

(n+l) 

1 

(n+l) 








Xi Xst *3 • • • x n-l X n 

Fig. 14.5.1. Division of a population into n + 1 equal proportions by n order 
statistics. 

14.6. Confidence Intervals for Percentile Points. Consider 
drawing at random a sample of size n from any continuously dis¬ 
tributed population. If £ P is the value below which the propor¬ 
tion p of the population lies [that is, £ P is defined by F(£ p ) — p\, 
then the probability that exactly i members of the sample will 
have values less than £ p is Qp% 1 - pY'^- Now consider the 
probability that x T , the rth order statistic, exceeds £ P . This can 
happen if x r exceeds £ P but x r -i does not (and thus exactly r 1 
members have values less than £„) or if x r _i also exceeds § P 
but x r -2 does not (and thus exactly r - 2 members have 
values < £ p ), . . . , or if all members of the sample exceed £ P . 
These events are exclusive of each other, therefore the prob¬ 
ability that > £ P is 2 C?p\ 1 - VT-' 1 - ^ a similar way it 

i — 0 

1 A proof is given by Mood (9). 
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can easily be shown that prob (x a < £„) = X C”p i ( 1 — p) n ~ i . 

i — s 

The probability that in the same sample either x r > ij p or 
x s < ij p is prob (x r > % P ) + prob (x a < ij p ) — prob (x r > % p and 
x a < ij p ). If s is taken greater than r, then prob (x r > ij p and 
x s < ij p ) = 0. We have then 

prob (x r < kp < $*) — prob (neither x r > % p nor x s < £ p ) 

= 1 — prob (x r > ij p or x s < ij p ) 

r — 1 

= 1 - X Qp\ 1 - p) n - { 

i — 0 

n 

- £ Qp\ 1 - p)“-‘ 

i — s 

s — 1 

= X GiPK i - v) n ~ l 

i — r 

Thus we can find confidence intervals for ij p at the 1 — q level of 
confidence by finding values of r and s for which 

3-1 

X CiP% 1 — p ) n = q 

i^r 

For example, a .05 confidence interval for £.20 can be obtained 
from a sample of size 20 by finding values of r and s such that 

s — 1 

X Gt(H) i (y 5 ) n ~ i = .95. 

i—r 

The tables for the sign test can be used for finding confidence 
intervals for the median £ fi o. 

A confidence interval with one bound only can of course be 
found by making use of either x r or x s alone. The derivations 
are left to the student. 

It should also be mentioned that point estimates of can be 
made by using the xi s as estimates of the i/{n + 1) points and 
interpolating linearly. 

14.7. Wilcoxon’s Matched-pairs Signed-ranks Test. 1 When 
a population of paired observations is being sampled, and when 

1 The derivation of this test is given by Frank Wilcoxon, Individual comparisons 
by ranking methods, Biometrics Bulletin , vol. 1, pp. 80-82, 1945. 
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the differences between pairs can be considered quantitative, 
then a somewhat more powerful test of the hypothesis that the 
differences are symmetrically distributed about zero can be made 
than by using the sign test described in Sec. 14.2. If the differ¬ 
ences are ranked according to absolute size, the sum of ranks for 
positive differences should tend to equal the sum of ranks for 
negative differences, if the hypothesis is true. If either sum is 
smaller than a certain critical value, the hypothesis can be 
rejected. Tables of significance values for the smaller sum of 
ranks for 6 to 25 differences and for the .05, .02, and .01 levels of 
confidence (two-tailed test) are given in Appendix D. The p 
values must be halved for a one-tailed test. For more than 
25 differences each sum of ranks is approximately normally 
distributed with mean N(N + l)/4 and variance N(N + 1) 
(: 2N + l)/24, in which N is the number of differences. 1 

EXERCISES 

14 . 7 . 1 . How many television tubes must be tested to determine limits within 
which it can be asserted at the .05 level of confidence that 99 per cent of the 
tube lifetimes lie? 

14 . 7 . 2 . An individual who claimed to have extrasensory powers of percep¬ 
tion maintained that the reason she did not score significantly more “hits” 
(correct naming of cards) then chance was that she would receive correctly 
for several times, then incorrectly for several times, etc.; that is, that her 
ESP powers “came and went in spurts.” In an experiment she scored the 
following sequence of hits and misses: 

HHHMHHMMMMHHHMMMHHHHMMMHHHHMM 

HHHMMM 

Do these data support her claim; if they do, at what level of confidence? 

14 . 7 . 3 . In an experiment on cognitive processes, each of 12 subjects was 
classified according to whether a rather complex contingency table yielded a 
positive or negative relationship. There were 10 negative and 2 positive 
relationships found. Is there evidence of a preponderance of negative rela¬ 
tionships in the population sampled? 

14 . 7 . 4 . In an experiment using identical twins, one member of each pair 
was tested under one set of conditions (A) and the other member was tested 
under another set of conditions ( B ). The datum recorded was the number of 

1 A further discussion of nonparametric methods, as well as a good bibliography, 
is given by Lincoln E. Moses, Non-parametric statistics for psychological research, 
Psychological Bulletin, vol. 49, pp. 122-143, 1952. 
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seconds it took the subject to solve a certain problem, and the following results 
were obtained: 


Twin pair 

A 

B 

1 

80 

39 

2 

21 

16 

3 

74 

90 

4 

53 

32 

5 

128 

62 

6 

5 

45 

7 

60 

15 

8 

25 

33 

9 

111 

91 

10 

54 

42 

11 

64 

81 

12 

42 

18 

13 

73 

79 

14 

92 

36 

15 

40 

41 


Make two nonparametric and one classical test with these data, explaining 
the hypothesis tested and assumptions involved in each case. 

14.7.5. A machine which is claimed to turn out digits “at random” gives 
the following sequence :4028397122964063518073812433b 4. 
Test this sequence for “randomness” considering it as a sequence of 

a. odds and evens (considering zero even) 

b. numbers greater or not greater than 2 

c. primes and nonprimes (2, 3, 5, and 7 are primes) 
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Appendix A 


SOME HINTS ON HOW TO ASK 
QUESTIONS OF MATHEMATICAL 
STATISTICIANS 


In modern mathematics only the abstract structural properties of objects 
; r °L SetS f °^ 6 f v f ) are ° f any real wncern; what the objects are is largely 

noo7t nt ' / Mat “ alStatlStiCS 18 n ° exce P tion : "'hat the members of l 
? P . at , 10n < '° r sa “P le ) are 1S of no importance whatever. When any problem 
is stated m concrete terms, that is, in terms of what objects actually are itTs 

erties bef ° a J >StraC * from the concrete situation the essential structural prop¬ 
erties before the problem can be handled mathematically (in the following dis¬ 
cussion, this kind of abstracting will be referred to as “structuring”). To 
most mathematicians a problem becomes really interesting only when (but not 
necessarily when) it is structured. Unless a mathematician is interested in 
applications, he is not particularly interested in structuring nor has he been 
especially trained m this process. It is true that many problems of pure 
mathematics have grown out of concrete situations or concrete ways of think- 

teSntw" eSPeC fi H trUe ° f the the ° ry ° f P robabil ity-and in mathematical 
£a“d knot bT\ ? remnants 0f this ori S in (“slope, rate, neighborhood, 

tha in °oH ;r me ; 6 Carl ° method ”), but when one recognizes 

at m modern mathematics even the term “point” has lost its intuitive 

geometrical meaning, it is apparent how far the process of structuring has gone 

wh “r " f “ ma< ? e in C0nsulti ^ a mathematician is in 

Tbsn! t ber ® 0f a Population (or sample) are (that is, telling him what 

“in h s r r is r ng to make) and ieaving the entire burden 

of structuring to him. Even this mistake might seldom be fatal if only 

relevant !s oft^n ^ 77 Unfortunatel y< the irrelevant as well as the 

- evant is often included, and some of the relevant is omitted. The mathe- 

Z ^:S a r fi a ^ b rir n Iu he l med by the com P lexiti - and technical jar“oWof 

if L cannot laket 6 S UP ° r giv6S misleadin § particularly 

ii ne cannot make his own questions understood. J 

Although there are a few statisticians who are trained in a number of fields- 

or who at least understand the language of those fields—and who are interested 
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in applications, there is only one practicable solution to the problem of statis¬ 
tical consultation, and that is for the research worker himself to master enoug 
of the concepts and language of mathematical statistics to perform most of the 
structuring himself and to state his problems in mathematical or at least semi- 
mathematical language. The following series of questions is intended to help 

structure a research problem in statistical terms. _ ■. 

TZ‘*» • population be defined «uch that the an»er to the research 
problem will lie in the form and/or parameters of the distribution of the 
population? In general there will be more than one way of defining t 
population unless the problem itself is unusually precisely specified. Fo 
example, if a psychologist is interested in whether a given pro^^re A resdts 
in more change in a certain direction than a given procedure B, he may struc 
ture the problem into two populations (changes with procedure A and changes 
with procedure B) or one population (pairs of changes, with method A and 
with method B), and which of these is more suitable depends upon several 

quite complicated considerations. . ? mwer 

2. How can a random sample be drawn from the populate . 
to this Question will of course often influence the answer to the first, as some 
popilltensTre much more readily sampled than others. Although the firs 
condition for randomness-that each member of the population have equa 
probability of being drawn—is frequently violated, it can very 
plausibly Irgued that whatever bias is introduced is completely irrelevant to 
tee ptblem being studied. The violation of the second condition-that each 
member be drawn independently-is usually more serious, because the viola¬ 
tion of this condition usually results in a spuriously low estimabon of the 
sampling error and consequently assertions are made at a spuriously low leve 
of confidence. The violation of the assumption of independence accounts for a 
large proportion of statistical errors to be found m the literature. 

3 W Jt statistics will make possible a relevant in erence about the form or 
parameters of the population, and what are the sampling d ^tnbutions of t 
statistics? The answers to these questions are mutually interdependent upon 
the answers to the first two. In attempting to answer them the question 
what assumptions can be made about the distribution of thc P^aten wm 
usually arise. It is with the last questions that one has reached the point that 
it may be necessary, and perfectly appropriate, to ask a statistician in . In 
language. He knows about the sampling distributions of many statutes 
where to find tables of them if available, and how to make approximations if 
tables are not available. He may be able to derive the distribution of a 
statistic if it is not already known. By structuring the problem himself, the 
research worker can be prepared to give the statistician the information he 
needs in order to be of any aid to the research worker. 
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MATHEMATICAL APPENDIX 


B.l. Limit of a Sequence. 

infinite sequence 


The student will recognize intuitively that the 




1 

n 


X av t th To 7 ; ! n th ° Ugh n ° member of the ^^e is exactly 0. 
By saying that 0 is the limit, we seem to mean that in some sense the terms of 

the sequence get closer and closer to 0 as n gets larger and larger. There is a 

certain vagueness m this statement, however; interpreted literally it does not 

:;t* ien cr * teron for rr rting that a se ^ ence o«a 

limit. For example, the terms of the sequence 


1 + .01, -f- .01, . 


+ . 01 , 


also get closer and closer to 0, because each term is smaller than the preceding 

Hit W w°ni S 7, mtUitiVeIy (and COrre ° tly) that thelimit of th5s sequence 

77l ’ ' The T 16 argUment would a PP J y if we substituted any 

number even a very small one, for .01; for example, the sequence with the 
general term 1/n + .00001 would approach .00001, not 0, as a limit 

lilv7 f Cl ? Sef .” th6n ’ we mUst mean closer than any definite quan- 

fn-ioo Ca f f SSlgn m . advance > that closer than .01 or .00001 or even 

a . , ‘ , ,. IS statement gives us the clue to a precise definition. By the 

statement that an infinite sequence approaches 0 as a limit, we mean that no 
matter how small a number « we choose in advance, the terms of the sequence 
eventually get closer than e to the limit 0. Even this statement is not quite 
right, because we can produce a sequence that might be said to satisfy it and 
yet does not have 0 as its limit, for example, the sequence 


1, 


^2, 


1 , H, 


1 , K, 


h H 


00 ) 


We can pich as small an 6 as we like and some of the terms eventually get 
smaller than e, but every other term is the number 1. Therefore we must 
insist m our definition that all the terms eventually get closer than « to 0, and 
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by “eventually” we must mean beyond some definite place in the sequence. 
In other words, to say that a sequence 


approaches 0 as a limit means that for any e, no matter how small, there ex s 
some term a N , such that any term beyond a N , that is, any term a. with n >■ , 

will differ from 0 by less than « in absolute value. We can easily prove that 
The sequence with £ = 1/n approaches 0, as follows no matter wha the s,e 
of e, we can find an N greater than 1/1*1, implying 1/N < l«l, but » > 
then 1/n < 1/JV, thus 1/n < 1*1 ; therefore 1/n - 0 (“approaches zero ). 

The same arguments apply to any limit other than 0, and thus we arrive at 

the following general definition: 

The sequent, «„ ..■ • • 1>« «» 1 <“ *?,?' 

limit 1) if and only if for any positive «, no matter how small, there exists 

term a N (that is, a number N), such that for all n > N ,\a n I 

If we did not put absolute value marks around a. - L, any sequence dl of 
whose terms beyond a certain point are less than L would meet our definition, 
as a - L would be negative and therefore less than the quantity e. 

We shall now accept the consequences of our definition whether those con¬ 
sequences satisfy our intuition or not. One consequence is that the seque 

0 , 0 , 0 , . . . , 0 , . . . 

has the limit 0, as |0 — 0| < e. nnmViprs is 

B.2. Limit of a Series. An infinite sequence of additions of numbers is 

called an infinite series; that is, 

«1 "h tt2 "h + Un 

in which a n is the nth term in the sequence of additions is an infinite series. 
An example is the harmonic series 

1 + H + H+ • ' ’ +\+ ' ' ’ 

An infinite series is said to converge to a limit L if and only if the sequence of 
partial sums 


;i, ai + ci2, • • * t ^ dij • • • 


approaches L as a limit. Surprisingly enough, a series can be nonconvergent 
even though the terms of the series approach 0; for example, the harmonic 
series is nonconvergent. To prove this, consider that we can divide the series 
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into the following segments: 

1 

M 

H + H 

Vs + + H + H 

1 /( 2 - + 1 ) + 1 /( 2 - + 2 ) + • * * + 1 /( 2 - +1 ) 


Note that there are 2- +1 — 2- = 2-(2 — 1) == 2- terms in each segment 
(after the first two) and that the smallest of these terms is 1/2-+ 1 ; therefore, 
the sum of each segment is at least 2-(l/2 m + 1 ) = As there are infinitely 
many such segments (we can take m as large as we like), it is evident that the 
sum increases without limit. 

It is important to think of an infinite series as a sequence of additions rather 
than the sum of an infinite class, 1 because the order in which the terms are 
added is sometimes crucial. The alternating harmonic series, 

i - n + y s - h + ■ • ■ + (-i)" +i \ + • • • 

converges to log 2. By rearranging terms, however, we can obtain a series 
converging to any number whatsoever, or we can obtain a divergent series. 

Not all nonconvergent series tend to infinity; for example the series 

l-l + l- l+ *** + (-l)*+i + • • • 

1 Of course from a purely formal point of view the term "sum of an 
infinite class” has no meaning anyway except the operational meaning given 
by specifying an order of additions; however, many of the “paradoxes” of 
mathematics have resulted from this unfortunate terminology. For example’ con¬ 
sider the following one: take any positive series converging to .01; cover the rational 
points on the interval 0-1 in order, using the usual triangular ordering and cover¬ 
ing each point by placing over it a line segment equal in length to the correspond¬ 
ing term in the series. Let the rational point lie at the center of each covering 
segment; then we have the “paradox” of covering all members of a set of points 
which are dense on the interval 0-1, each with a finite line segment, with the sum 
of line segments only .01. The trouble lies in the word “all”; we should say only 
that we can cover as many as we like—which does not imply that we can cover any 
dense set at all. The conceptual difficulties associated with the notion of infinite 
classes, and operations performed with them, lie at the basis of controversies about 
the foundations of mathematics; these controversies entail interesting psychologi¬ 
cal issues. 
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has the sequence of partial sums 


1 , 0 , 1 , 0 , . . . 

which approaches no limit but never gets larger than 1. 

An excellent introduction to series can be found in Courant's Differential and 
integral calculus (1). 

B.3. Continuous Functions. We have already in the body of the text 
defined a function as a class of ordered pairs, such that if any two pairs have 
the same second member they must have the same first member (though the 
converse does not hold). When we say y is a function of x we mean that for 
every value of x (for which the function is defined) there is one and only one 
value of y (first member). Functions can be given by rules like y — x 2 for 
— oo < x < co or like 

( 0 if x is 0 

y = \ x if x is a rational number 
{ l/x if x is an irrational number 

A rational number is one that is equal to the ratio of two integers; for 
example, 1 J4 2 is rational. An irrational number is one that cannot be so 
represented, for example, y/% ir, and e. Both rational and irrational numbers 
are dense on the number axis, that is, any finite interval contains infinitely 
many of each; thus the function just described would be very hard to visualize. 
Consider, for example, trying to graph it; our graph would look something like 
Fig. B.3.1. It may appear from this figure that for each value of x there are 
two, not one, values of y . This results from the density of both rational and 
irrational numbers; actually corresponding to each point on the x axis there is 
only one point on the y axis; for example above the point 2, which is rational, 
only the point on the straight line “counts” as part of the function. Func¬ 
tions like this do not behave in the way we naively think functions ought to, 
which is to let y change continuously from one value to another as x changes 
continuously. The intuitive concept of continuity, like the intuitive concept 
of limit, is vague. By “continuous” we seem to mean that there are no 
jumps in y as the value of x changes. By using the limit concept we can make 
this notion of “no jumps” precise as follows: 

Definition. A function / is a continuous function of x at the point x = x\ 
if and only if / is defined for xi and for every positive number e, no matter 
how small, a positive number 5 can be found such that if \x — xi\ < 6 then 
I/O*) ~/0*i)l < e. 

This definition means that we can restrict f(x ) to a preassigned interval 
around f(xi) that is as small as we wish to make it by restricting x to a small 
enough interval around xi. 

Definition. A function / is a continuous function of x in the interval 
a < x < b if and only if / is continuous at every point in this interval. 
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In the following discussion we shall be concerned only with continuous 
functions; in fact, we shall for the most part be concerned only with a special 
subclass of these functions, those that are differentiable (see Sec. B.5). In 
advanced mathematics discontinuous functions are of great importance. 



Fig. B.3.1. An attempted graph of the function given by the rule 


y = 


0 if x is 0 
# if # is rational 
1/x if x is irrational 


B.4. The Definite Integral. Consider any function / continuous in the 
interval a < x <b. As an aid in the following discussion we shall represent/ 
by Fig. B.4.1, although our discussion will hold for any continuous function 
whatsoever, including those which dip below the x axis. 

Consider the interval a-b divided into n intervals, each of length Ax. 
Within each interval i take any value x% and form the sum 


f(xi) Ax + f(x 2 ) Ax + • • * + f(x n ) Ax s= 



A# 
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Now consider the sequence 

12 n 

^ f(xi) Ax, ^ f(xi) Ax, .... ^ f(xi ) Ax, .. . 
i — 1 i = 1 »= 1 

It can be proved that for any continuous / this infinite sequence approaches a 
limit. As a matter of fact, a limit is approached even if the intervals are not 
equal (for a given n ); it is necessary only that the longest interval Ax in each 
division approach 0 as n approaches infinity. A proof of this important 
theorem can be found in any rigorous textbook of the calculus, for example, 
Courant’s Calculus (1). 


y 



Fig. B.4.1. Area under a continuous curve approximated by rectangles. 

The limit approached by this sequence is called the definite integral of the 
function f in the interval a < x < h and is written with the sign J; that is, 

n 

/ f(?) dx = lim V f( x .) Ax 

J a n—> co 

Ax—> 0 i=l 

The symbol dx is used to indicate that Ax —> 0. 

Note that if we use our intuitive concept of the area under the curve in Fig. 

n 

B.4.1, the sum ) f(xi) Ax is an approximation to that area, which tends to 

i=l 

get better and better as n —» and Ax —> 0, because the “corners” which get 

rb 

included in the calculation get smaller and smaller. The limit / f(x) dx can 

Ja 

therefore be used as a definition of the area under the curve, as it satisfies our 
intuitive concept of area. 

B.5. The Derivative. If f(x) changes continuously from f(a) to f(h) as x 
changes from a to h , we might take the ratio as a kind of “average” 
rate of change of f(x) in this interval. However, this average rate might be 
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very different from the average rates of change of much smaller segments of 
the interval a-b. For example, Fig. B.5.1 shows a continuous function which 
has an “average” rate of change of 0 for the interval a-b; yet for the small 
segment c-d the function is changing rapidly in one direction and for the seg¬ 
ment e-6 it is changing rapidly in the other direction. For this particular 
function, we might take a large number of small intervals each of length Ax, 
withm each of which the rate of change is almost constant (in which case the 
curve could be approximated within each interval by a straight line), so that 
the “average” rate as defined previously would give adequate information 
about each segment. This would be not only laborious but also imprecise; 
further, it would be necessary to study each function in detail in order to see- 
ment it properly. 


y 



A solution of the rate-of-change problem is achieved by considering the rate 

of change at a point. Let A* = x — k and form the ratio ~ ^1. Now 

Ax 

keeping k fixed, let x approach k in any manner whatsoever, so that Ax 
approaches 0, through positive or negative values;/^) will of course approach 
f(k), as we are assuming that the function is continuous at the point k. If the 
f(x^) — f (jc) 

ratm — approaches a limit L, regardless of the manner in which A* 
approaches 0, then L is called the derivative of f with respect to x at the point k. 
To say that approaches L as Ax approaches 0 means that for any 


positive number € a positive number 3 exists such that if \Ax - 0| < S (exclud¬ 


ing the case A* = 0) then 


m - m 

Ax 


< 6 . 


The derivative is abbreviated £ or £ f( x ) or f( x ) or £ [if y = f( x)] 
sometimes simply D x f or D r f(x). 
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By using the definition and the four-step rule given in the text, the student 
should try to prove the following theorems as exercises. 

Theorem B.5.1. — c = 0. (The derivative of a constant is zero.) 

dx 

Theorem B.5.2. — x = 1. (The derivative of a variable with respect to 

. dx 

itself is one.) 

Theorem B.5.3. -f c /0) = c TT /W • ( The derivative of a constant times 
a function is equal to the constant times the derivative of the function.) 
Note that this theorem is a special case of Theorem B.5.5. Note also the 

special case ^ cx = c. 

Theorem B.5.4. ^[f(x) + „(*)] = |/(*) + (The derivative 

of the sum of two functions is the sum of the derivatives.) 

Theorem B.5.5. ^ [/(*)?(*)] = /(*) g(x) + g(x) ^ /(*)• (The deriva¬ 

tive of a product of two functions is the first function times the derivative of 
the second plus the second times the derivative of the first.) 


Theorem B.5.6. U(x)]° = , 

Note the special case ~ x c = cx c 


dftx)}*- 1 ~fix), in which c is any constant. 
dx 


Theorem B.5.7. - [«,<*)]» 

Proofs of the following theorems are more difficult, but they can be found 
in almost any calculus textbook. 


Theorem B.5.8. £log/(*) = * ote the s P ecial ca8e Tx 

, 1 1 
i ogCT = _c = -. 

Theorem B.5.9. ^ ^ /(*) • Note the s P ecial case e ' x ~ c( '"'' 

Many other theorems can be found in any calculus textbook; in fact, the 
process of differentiation has been reduced to a purely mechanical calculation 
for all, or almost all, functions encountered in applications. In courses in the 
calculus the student usually gets sufficient practice so that he can differentiate 
about as rapidly as he can do simple arithmetic. ,. 

The derivative is itself a function and, in the case of almost all functions 
encountered in applications, can itself be differentiated, so that we obtain the 

derivative of a derivative. For example, if/M = x 2 , then ^ f(x) - 2x, and 

d 2 

± r A f(x)'] = i- 2x = 2. The second derivative is abbreviated ^ f(x) or 
dx L dx JK ’ J dx 


J-iffi). Note the special case -=- 
f(x) dx J dx 


jLen*) = e ,w - /(*). Note the special case j- e cx - ce C[ 
dx dx ax 
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d 3 


Similarly, we can usually obtain ~ f( x ) y 


d n 


d n 


> dx n I ntlie 


/"(«), etc. 

f}2 

example /(*) = **, — f( x ) = 2 and ~ /(*) = 0 for « > 3. 

Whereas the first derivative is the rate of change of/(*) with respect to *, 
the second derivative is the rate of change of the rate of change. For example 

ww er te lncreasin S volume of a spherical balloon as air is blown into it! 
What is the rate of change of the volume as a function of the radius? The 

volume V is given by the formula V = % ttR\ Therefore, ~ = 4ttR/ which 

implies that a given increase in R is accompanied by a much larger increase in 
V when R is large than when R is small. The rate of the rate of increase is 
tR, that is not only is the volume increasing at an increasing rate, but the 
increase m the rate is itself increasing with R. On the other hand, the rate of 
increase m the rate of increase in the rate of increase in V is the third deriva¬ 
tive, the constant 8ir; this third derivative does not change with R 
If air is being blown into the balloon at the rate of k units per second, we can 

write V as a function of the time t in seconds; V = kt. Then — = k• that is 

the rate of increase of V with respect to time is a constant, as is Intuitively evi- 
dent since air is being blown in at a constant rate. 

If an object moves in a straight line so that we can write its distance from 
the point of origin as a function of the time of travel, that is, D = f(t), in 

which D is distance and t is the time in some appropriate units, then — is the 

dt 


instantaneous velocity of the object. For example, if D = cP, then — 


dt 

d 2 D 


2 ct, 


The second derivative, » is the 
dt 2 


and the velocity at the time t = 5 is 10c 

acceleration, that is, the rate of change of the velocity. 

A geometrical interpretation of the derivative is suggested by considering 
e hnear function y - ax + b. The reader will recall that the slope of a 
straight line is defined as the tangent of the angle formed with the .x axis, as 
illustrated in Fig. B.5.2. The tangent of a is the ratio of 6 to 6/a; thus the 
tangent is a. Note, however, that the derivative is also a, for A y/Ax is equal 
to a for every size Air taken from any point Xl ; therefore lim Ay/Ar = a. 

Thus the derivative is the slope of the straight line. Now colder the curve 
g^en by y = /(*), illustrated in Fig. B.5.3. The slope of this curve at the 
point P is the slope of the tangent at that point, and the tangent is defined as 
the hmiting position of the secant PQ as Q approaches P. Note, however, 
at PQ is the hypotenuse of the right triangle whose sides are Ay and Air. 
I hus, the slope of the tangent is Um^ Ay/Ax, which is the derivative. Thus 

the derivative at a given point, if it exists, is the slope of a curve at that point. 

buppose that m a given interval of the x nxkf(x) rises to a maximum point 
and then decreases. When/(x) is increasing the derivative is positive; when 



228 BASIC STATISTICAL CONCEPTS 

f(x) is decreasing the derivative is negative. At the maximum point f(x )I is 
neither increasing nor decreasing; therefore, if the derivative exists at the 
maximum point, it must have the value zero. This is intuitively evident from 
the geometrical interpretation, for the slope at the maximum is zero unless the 
maximum is a “sharp point” on the curve, in which case the derivative at that 




point does not exist. We can also prove this assertion by the following argu¬ 
ment. If f(x) is a maximum at the point k, then fix) - }{k) will be negative 

for all x k, and ^ X ^ ~ ^ ■ will be positive or negative as Ax is negative or 

positive. Therefore, cannot be positive if Ax approaches 0 

through positive values and cannot be negative if Ax approaches 0 through 
negative values. Therefore, if the limit (derivative) exists, it must be 0. 

Figure B.5.4 illustrates a maximum point of each kind; the derivative at Xi 
is 0 and the derivative at x 2 does not exist. 
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A similar argument holds for minimum values of/(*), that is, the derivative 
at a minimum is 0 , if it exists. Note that minima and maxima are defined in 
terms of their neighborhoods (nearby values); for example, both/(a; 5 ) and f(xi) 
are minima, although f(x s ) < f(x 4 ) and both are larger than values of fix) on 
the left side of the figure. 

We have said that if f(Jc) is a maximum or minimum and if f'(k) exists, then 
/ (k) - 0. On the other hand, it is not true that if f(k) = 0 then f(k) is 
necessarily a maximum or minimum. For example, in Fig. B. 5.4 f’( Xs ) = 0 
yet f(x 3 ) is neither a maximum nor a minimum. 



To find the maximum and minimum values of f(x) we need merely differ¬ 
entiate, set / (x) - 0 , solve for the values of x satisfying this equation, and 
then examine /(*) m the neighborhood of each solution to see whether the 
solution gives a maximum, a minimum, or neither. For example, consider the 
function given by /(*) = **/3 - ** - 3x + 3 . Then f(x) = x* - 2x - 3. 

vlng “ 3 J 0 we obtain 3 and -1 as solutions. Now compare 

/(3) with/(3 + A) and /(3 — A), where A is small and positive. 

f(3) = 9 — 9 — 9 + 3= —6 
_i_ ai — 27 + 27A + 9A 2 + A 8 

/(3 + A) 3 -(9 + 6 A + A 2 ) - (9 + 3A) + 3 

0-7 = -6 + 2A 2 + A 3 /3 >/(3) as A > 0 

_ ai 27 27A + 9A 2 + A 3 

A ' ) 3 “-(9 - 6 A + A 2 ) - (9 - 3A) + 3 

= -6 + 2A 2 - A 3 /3 > /(3) for A <1 
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Therefore, /(3) is a minimum. Similarly, 

% — 1+3 + 3 = 4%j 

1 + 3A - 3A 2 + A 3 _ (1 _ 2 a + A 2 ) + (3 - 3A) + 3 

= 4 3i - 2A 2 + A 8 /3 < /(-l) for A < 1 

4 ~ 3A ~ ~ ^ — (l 4- 2A + A 2 ) + (3 + 3A) + 3 

3 v 

= m~ 2A 2 - A 3 /3 < /( —1) 

Therefore /( —1) is a maximum. 

We could have saved a considerable amount of computation by considering 
that if a function changes from increasing to decreasing the derivative is chang¬ 
ing from positive to negative and is therefore decreasing; thus the second 
derivative must be negative in this neighborhood. Similarly, if the function 
changes from decreasing to increasing the second derivative must be positive. 
These arguments can be extended; if the first derivative at a given point is 0 
and the second derivative is positive, then the first derivative is at that point 
changing from negative to positive (instead of, for example, merely decreasing 
to 0 and then increasing again, for then we would have to the left of the point a 
negative second derivative and on the right a positive second derivative), and 
the value of the function at that point is a minimum. Similarly, if f'(k) = 0 
and f"(k) < 0, then/(A) is a maximum. 

In the preceding example, f’(x) = 2x — 2; therefore, /"(3) — 4 and since 
= o, /(3) is a minimum, as we found from our laborious computation. 
Also /"(-l) = -4, and since/'(-l) = 0,/(-l) is a maximum. 

Note that if f(k) = 0 and f'(k) = 0 we have a point on the curve such as 
f(xs) in Fig. B.5.4. Any point k such that/"(A) = 0 is called a point of inflec¬ 
tion. If A; is a point of inflection, it is not, of course, necessarily true that 
= o. The point :c 6 in Fig. B.5.4 is a point of inflection, but in this case 
the first derivative changes from increasing to decreasing but remains positive 

throughout. . 

B.6. Primitive Functions. Consider any differentiable function / and its 
derivative/'. The function/is called a primitive function of/'. We say “a” 
rather than “the” because/' is always the derivative of more than one func¬ 
tion. In fact, any derivative has infinitely many primitive functions; let 
= f(x) + c; then if/' is the derivative of / it is also the derivative of h, as 

c' — 0 . , 

The differentiable function/is itself the derivative of some function; as a 
matter of faet, any continuous function has a primitive function and therefore 
infinitely many. Let G be any function such that G’(x) =/(»); then let 
H(x) = G(x) + c; then H’(x) = f(x) also, that is, any two functions differing 
only by a constant have the same primitive function. The converse is also 
true - any two primitive functions of the same function must differ only by 
a constant. Let G’(x) = H’{x). Then let U(x) s G{x) - H(x). Then 
U'(x) = G'(x) — H'(x) — 0, that is, the rate of change of U with respect to x 


/(-l) = - 
/( —1 + A) = - 

/(-l - A) = - 
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is everywhere zero; this can happen, however, only if G(x) and H(x) differ only 
by a constant. 

B.7. Indefinite Integrals. Consider any function / that is continuous over 
the range — oo < £ < oo. We know already that the definite integral 

J f(x) dx exists, where a < b. Note, however, that this integral exists for all 
values of b, letting b range over the same values as x . This means that the 
value of J b f(x) dx depends upon (is a function of) b, considered as a variable. 
Substituting t for b (as the letter b ordinarily stands for a constant), we have 
f 1 f(x) dx, where a < t. Obviously it makes no difference what letter we use 

Ja 

for the variable following the integral sign, that is, we can write 

J* f(x) dx = f* f(y) dy 

meaning that if / remains the same the integrals have the same value. For 
example, f 1 x 2 dx = j y 2 dy; \ (l/z) dz = f^l/w) dw. If this is not clear 

Ja Ja Ja Ja 

the student should expand the integrals into the original limit notation and 
satisfy himself that our equations hold. 

It is convenient in defining the indefinite integral of f(x) to use the notation 

/ X f(u) du and to abbreviate this expression by F(x) ; that is, F(x) = I f(u ) du. 
Ja Ja 

B.8. Fundamental Theorem of the Calculus. The fundamental theorem 
of the calculus states that the derivative of the indefinite integral 


F(x) [ = f* f(u) dw] 


is equal to the value of f(u) at the point x, that is, F ! (x) — f(x). 

If we remember that F(x) is the area under the curve y — f(u) between the 
limits a and x, as represented in Fig. B.8.1, this theorem is intuitively plausible, 
for it states that the rate of change of the area at the point x is simply f(x), the 
height of the curve. 

Let M and m be respectively the maximum and minimum values of f(u) in 
the interval between x and x + A u. Then the area F(x + A u) — F{x) lies 
between M A u and m A u, that is, 

m Au < F(x + Aw) — F(x) < M A u 

Therefore 

m < F ^ x + - A-.~ < M 

~ Au ~~ 

Therefore 

lim m < lim ^ - E&l < lim M 

Aw—>0 Aw—>0 A U Aw—>0 
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f(x) < F'(x) < f{x), implying F>{x) = f{x) 

This important theorem states that F(x) is a primitive function of f(x ). As 
shown in Sec. B.6, any two primitive functions can differ only by a constant. 
Let G(x) be a primitive function of f(x), then F(x) = G(x) + c. Note that 

F(a) = J a f(u) du — 0. Then F(a) = G{a) + c = 0. Therefore, the indefi¬ 
nite integral 

du = F(x) = F(x) - 0 = F(x) - F(a) = G(x) + c - G(a) + c 

= G(x) - G(a) 

If we now consider the point x as fixed, the preceding result shows that to 

evaluate a definite integral [ f(x) dx we need merely find a primitive function 

Ja 

F such that F f = /; the value of the integral is F(b) — F(a). 



It is because the indefinite integral of / is a primitive function of / that the 
customary notation Jf(x) dx = F(x) + c is used. F is actually a primitive 
function, and c, the “ constant of integration,^ expresses the fact that any two 
primitive functions can differ only by a constant. 

B.9. Distribution Functions (Continuous Case). We define a cumulative 
distribution function F as any function F with the following properties: 

F is defined over the range — co < x < co , that is, for all values of x . 

F is monotonic nondecreasing, that is, if X\ < x 2 , then F{x i) < F(x 2 ). 
lim F(x) = 1. 

X —> <» 

lim F(x) — 0. 

X—> — 00 

If F is differentiable, its derivative / is a frequency function, or probability 
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density function. Note that if F(a) = 0, in which a is some value of x, then 
fix) = 0 for all x < a) similarly, if F(b) = 1, then/(a) = 0 for all x > b. 

Note also that since F is monotonic nondecreasing, f(x) > 0 for all values 
of x . 

So far we have said nothing whatsoever about random variables or sampling 
theory. The definitions of F and / are abstract mathematical definitions 
which show the properties which functions must have in order to qualify as 
distribution functions. The reasons for choosing these properties become 
clear when we consider the informal meaning which we wish to give to F(x), 
that is, the proportion of a population having values less than or equal to x. 
Obviously if x\ < x 2 the proportion having at most the value xi cannot be 
greater than the proportion having at most the value x 2 . Therefore, we lay 
down the requirement that F is monotonic nondecreasing. As the total pro¬ 
portion must be 1, we require that lim F(x) — 1; for an analogous reason we 

X—> « 

impose the requirement that lim F(x) = 0. It is not necessary to impose 

X —» — 00 

the additional requirement that F(x) > 0, because this is deducible from the 
other requirements. 

The concept of a random variable X is introduced by the following defini¬ 
tions: 

Probability (X < x) — F(x). [The probability that a random variable X 
will assume a value less than or equal to x is F(x), where F has the properties 
previously described.] 

Probability (a < X < b) = F(b) — F(a). 

The probability density of the random variable X is defined as f(x), where/ 
is the derivative of F . 

B.10. Mathematical Expectation. The mathematical expectation of any 
function of X, say g(X), is defined as 

E[g(X)] see J R g(x)f(x) dx 

Examples. 1. The expectation of X is J R xf(x) dx. 

2. The expectation of X 2 is J R x 2 f(x) dx. 

3. The expectation of X — c is J R (x — c)f(x) dx. 

Note that if g and h are two different functions, 

E[g(X) + h(X)] = f [g(x) + h(x)]f(x) dx = f g(x)f(x) dx + f h(x)f(x) dx 

“ mx)] + Em] 

As the sum of two functions is itself a function, this implies that 

n n 

e[ 2 gi(X )] = ^ E[ gi (X)} 

i = 1 t — 1 
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for any functions g h gr 2 , . . . , g n . The student should satisfy himself that 
E(aX + b) = aE(X) + b, in which a and b are constants. These properties 
are summed up by the statement that E is a linear operator. 

B.ll. Power Series. An infinite series of the form 

Co + CiX + C2X 2 + C3X 3 + * • • + c n x n + * * • 

in which the c’s are all constants is called a power series. There are many func¬ 
tions which can be written as power series, by a suitable choice of the constant 
coefficients. Let us assume that / is such a function, that is, that 

f(x) = Co + CiX + C2X 2 + • • • + C n X n + * * • 

Let us also assume that the nth derivative of / exists for all n, and that we can 
differentiate the terms of the infinite series just as we would a finite series. 
Then 

f(x) = ci + 2c 2 # + 3c 2 + • • • + nc n x n ~ l + * • * 
f'\x) = 2c 2 + (3)(2)c 3 # -f • • • + (n)(n - l)c n ^ w ~ 2 + * * * 

f (n) (x) = nlc n + (n + l)lc n +lX + * * • 

Therefore, f(0) = c 0 

f(0) = d 

/"(0) = 2c 2 or c 2 = 


f (n) ( 0) = n\Cn or c n = ^ 

Therefore, 

'(*) =/(0) +/'( 0)x **+••• + f -^r »“+■•• 

This series is called a Taylor series , after Brook Taylor (1685-1731), who 
discovered this method. 

As an example consider the function given by f(x) = e x . Wehave/(0) = 1; 
also 

f(x) = e x and f(0) = 1 
f'(x) = e x /"( 0) = 1 


/(»)(a?) = e x / Cw >(0) - 1 

x — 1 + x + x 2 /2\ + x z /3l + • • • + x n /n\ + • * * 


Therefore, 


e■ 
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... B - 1 , 2 ' Moment-generating Functions. Among the most important proper- 
tres of a distribution are its moments. The kth moment about the point c is 
e ned as E[(X c)*J. The &th moment about the origin is thus E(X k ) It 
is often convenient to find moments about the origin by means of the following 
integral, called a Moment-generating function. 


E(e ,x ) = j R e f *f(x) dx 


To show that this integral provides a way of obtaining moments, we first 

expand e tx m a Taylor series: 

E(e tx ) = E( 1 + tX + t 2 X 2 /2! + • • • + t h X k /k\ + • • •) 

Remembering that E is a linear operator and assuming that we can treat 
this particular infinite series as though it were finite (this can be proved by 
advanced methods), we obtain y 


E(e‘ x ) = 1 + tE(X) + i! E(X>) + 
£! 

Differentiating with respect to t, we obtain 


+ £l W) + 


E{X) + t,E(X 2 ) + \r E(X S ) + ■ • • + 


t k-i 


- E(X k ) + 


(k - 1 )! 

For t = 0 the value of this series is E(X). Differentiating twice, we obtain 


E(X*) + tE(X 3 ) + • • • + 


t h ~ 


(k - 2 )! 


E(X k ) + 


For f = 0 this becomes E(X*). In general, by differentiating h times and 
setting t = 0 we obtain E(X k ). 

As an example, the moment-generating function of f(x) = e ~ x , x > 0, is 
E(e ' x ) - L “ e tz e~ x dx = f " e .«-» dx = e -^ I " 

J Jo t - 1 [o 

As we are interested only in small values of t, we can consider t less than 1; the 
evaluation of the definite integral is therefore 


t - 1 t - 1 


= 0 


t - 1 


= (1 - t)~ l 


di 


(1 - t) 1 = - 1(1 - i)~ 2 (-l) = (I - /)-2 


<ft 2 


(1 - t)~ 1 = 2(1 - <)-3 


Then 
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E(X) = (1 - 0)- 2 = 1 
E{X 2 ) = 2(1 - 0)^ 3 = 2 

H = 1 

o' 2 = EiX 2 ) - [I?(X)) 2 = 2-1 = 1 

6t If two continuous distributions have the same moment-generating function 
it is evident that they have the same moments. If they have the sam 
moments, the two densities are the same. We shall prove this for the special 
case in which the difference of the densities can be expanded m a powei series. 
Let the expansion of the difference be 

f(x) - gix) = Co + CiX + C 2 X 2 + • • • + c n x n + 


J R [/(x) - ffO)] 2 dx = /,(* + CiX + • ■ •)[/(» o(?)] dx 

= Co f R [/(*) - g(x)] dx + Cl f R x[f(x) - g{x)] dx + ■ ■ • 

= co(l - 1) + ci [E(X) - E(X)} + • • • = 0 

a, all momenta are —d to be «<,nal But l,M - *<*. 
either positive or zero, and since the integral is zero, it follows that [f{x) (x 

is zero for every value of x and therefore that/(s) = g(x) for all values. T 
if two distributions have the same moment-generating function the density 
functions are the same (though we have proved this only for a special case). 
It is this theorem which makes moment-generating functions so 
statistical theory, for often it is much easier to prove tliat two distribut 
have the same moment-generating function than to prove directly that their 
density functions are equal. 

In theoretical work it is also useful to consider moment-generating functions 
of functions of X. Let g be any function; then the moment-generating fu 

tion of g is defined as = f R c ,a(l) /d) dx. 

B 13. Change of Variable. A device of great usefulness in man J ° f the . 
proofs that follow is that of change of variable. Suppose the distribution of 
the random variable X is given by f(x). Now let Y be somemoiioomc 
increasing function of X, that is, Y = g(X). What is the probability density 

° f The probability that X will assume a value between x and x + Ax is equal 
to the probability that Y will assume a value between y and y + Ay, where 
y = g(x) and y + Ay = gix + Ax), that is, 

prob (x < X < x + Ax) = prob (y < Y < y + Ay) 
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/(,,) s l im r < V +M. = l im 

A v^o Ay A ^ 0 


prob (x < X < x + Ax) 

Ay 

= lim prob Q < X < a + Ax) A* 
&y-*o Ax Ay 


Note that as Ay —> 0, Ax —» 0; thus the preceding expression is equal to ] 


lim + **) lim 


A.r—>0 


Ax 


Ax 
Ay-^o Ay 


- f \ dx 
■- fix) — 
dy 


We placed the restriction that Y be monotonic increasing in order that — be 

positive, as obviously f(y) must be positive or zero. This restriction can be 

removed by placing absolute value marks around that is, it can be proved 

ay 

that if Y is a monotonic function of X, f(y) = f(x) I ~ 

I dy i 

B.14. Multiple Integration. Before discussing random sampling it is 
necessary to introduce multiple integration. Let z be a function of x and y, 
that is, z = f(x,y).> Suppose further that z is continuous for each point (a,b) in 
the xy plane, that is, lim f(x,y) = f(a,b). Just as we defined the area under a 

z—>a 

y—>b 

cuive as a definite integral, so we define the volume under a surface as a defi¬ 
nite integral. Consider finding the volume between the xy plane and that part 
of the surface indicated in Fig. B.14.1. This will be the volume of the solid 
obtained by dropping perpendiculars from all points of the curve to the xy 
plane. We can make an approximation by dividing the solid into n slices 
parallel to the yz plane, each of length Ax, finding the approximate volume of 
each slice, and then adding the n volumes together. Consider the approxima¬ 
tion to the volume of the ith slice, indicated in Fig. B.14.1. ‘ Divide this slice 
into Vi. segments, each of length Ay. The volume of the ith slice will be 
approximated by the sum of the volumes of m prisms. The volume of the jth 
piism isf(xi,yf) Ax Ay, in which x. is any value of x in the ith slice and y } - is 
any value of y in the jth segment. For the ith slice both Ax and x t are con- 


m 

stants, and the approximate volume is £ f( Xi ,y,) Ax Ay. A better approxi- 

./-l 

m 

mation is Jhn^ ^ Ax Ay. Remembering that Xi and Ax are constants 

. 7=1 


A //>0 


for the ith slice, this limit is a definite integral Ax JJf{x h y) dy, in which the 

1 This assumes that the limit of this product is the product of the respective 
limits. This can be proved by advanced methods. 
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limits of integration y v and y h are functions of x, that is, will depend upon the 
particular value a*. The approximate sum of all slices is ^ A* f(x it y) dy. 


“ L, Jvl . 

i — 1 

n ■—^ oo and Ax 0, the 


As thinner and thinner slices are taken, that is, as n °° and l 
approximation becomes better, and we define the total volume as 


n 

x ** ir ' f(xi,y) dy 


abbreviated f*dx f*° f(x,y) dy or f n [ / tfj f(x,y)dy\dxor j a f(x,y)dydx. 
In performing 0 the integration we integrate first with respect to y, treating x as 


z~f{x,y) 



Fig. B.14.1. The volume under part of the surface z = f(x,y). 

a constant; then we integrate with respect to x. We could have begun by 
taking slices parallel to the a* plane, each of length Ay, in which case we would 

have ended with Jf f(x,y) dx dy, with Xu and x L being functions of y. 

Example. Let 2 = 2* + y. What is the volume of the solid which is 
located between the xy plane and this surface and whose base is a circle given 
by the equation * 2 + (y - by = r 2 ? Solving for y, we obtain 

y = b ± \A ’ 2 — ^ 

For a given x, the limits of integration on y are therefore b + Vr 2 - .t 2 and 
y _ _ 3 . 2 . The limits of integration on x are r and -r. The volume is 
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therefore 


f—r fb 


& +\/ r 2 — x 2 
b — y/r^x^ 


(2x + y) dy dx 


/ (4 xyffi — x 2 + 2fc-y/r 2 - a; 2 ) dz = [— |( r 2 - 


dx 

z 2 ) % 


+ *&(r 2 - x*)^ + 6 r 2 sin " 1 (a;/r)l r 

— V 

m which the notation sin - 1 {x/r) means the angle (in radians) whose sine is x/r 
As there are 2 tt radians in a circle, sin - 1 (1) = tt/ 2 and sin - 1 (-1) = _ x / 2 

Thus the answer is irbr 1 * 

In the special cas ef(x,y) = fi(x)f 2 (y), that is, the case in which f(x,y) is the 
product of two functions, each of only one variable, we have 


lim 

71 —> oo 

Ax —> 0 


X AX jy L d V 


n 


lim 2 A* J^hixiMv) dy 

n—+ eo J it 1 - 

Ax- 70 t ~ 1 


n 

A * LYf*(v) d V l b Mx) dx fjjMy) dy 


In cases m which y v and y L are constants (that is, the same for all values of x 
as when we find the volume between a surface and a rectangle in the xy plane)’ 
this latter product is the product of two ordinary single integrals. 

As with the single integral, we can take the limit of the double integral as one 

or more of the limits approach zero; for example, / " f( x ,y) dy dx means 

.. /*& fvu J W J °° 

, llm a L dy dx. 

b,yu—7 » J J itu 
a,yL—> — oo 

Our definition can be extended to any number of variables. Let 


= 71*1, * 2 , 


X n ) 


in which each r, is a variable. Further, let * be continuous at all points. 
ien a !™0 X2 > • • • > x n)^ 1 Ax 2 • • • Ax n exists and is 

Ax 2-4-0 


Aa: [t - 4 U 

abbreviated f R J ^ • • • J R J( Xl> x 2 , . . . , *,) dxi dx 2 • • ■ dx n in which 
R XI indicates the boundaries of (the region of x/) between which the integra- 

* This answer, however, considers the volume negative for points such that 
( X + y) < 0, i.e., all points of the solid below the xy plane. Actually the 
volumes above and below the xy plane should be found seperately and then 
added, considering both positive. 
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tion is to be performed. As in the two-variable ease, the integration is 
performed step by step and in any order desired. In performing each integra¬ 
tion the fundamental theorem of the calculus is used, that is, that a definite 
integral is the difference between two values of a primitive function, remember¬ 
ing that in finding a primitive function each of the variables other than t ie 
variable of integration is treated as a constant. 

It is a consequence of the definition that if g and / aie any two continuous 
functions then the multiple integral of their sum is the sum of the two respec¬ 
tive multiple integrals. Further, the integral of cf, c being any constant, is 
the product of c and the integral of /. That is, 


i'J' • • • /(/ + S') 11 dx i = JT 


//n dxi + 


Jgll dxi 


and fj • • • Je/TI dx t = cfj ■ • • J/n dx h with t = 1,2 . . . , n. These 
two theorems are summed up in the statement that the multiple integral, like 

the single integral, is a linear operator. 


Further, as in the two-variable ease, if£ 2 , • 


[] ^ en 


f( x h x *y 




and if the limits of integration are all constants, this is the product of n 

ordinary single integrals. „ 

B.15. Joint Distributions of Random Variables. Let f(x,y) be any func¬ 
tion which is continuous for all values of x and y. Then / can be considered a 
joint density function if it satisfies the following conditions. 
f(x,y) > 0 for all values of x and y 

f x \_ k f(x,y) dxdy = 1 

Now consider the two variables I and F, whose values occur m pairs, that 
is, whenever X assumes a value x, Y assumes a value y. We call X and x 
random variables when their joint probability density is given by a function oi 
x and y that satisfies the two foregoing conditions, that is, probability densiti 

(X = x; Y = y) = f(x,y). 

Each of the variables X and Y will have its own probability density, which 
is obtained by integrating f(x,y) on the other variable; that is, letting /1 >e ie 
density function for X and / 3 the density function for 1 , we have 


fi (?/) = 


, f(x,y) dy 
f(x,y ) dx 


Similarly, we can consider a joint density function for n random variables 
Xi, X 2 , . . . , X n , satisfying the requirements 
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/Oi, x 2 , ■ . . , x n ) > 0 

r oo 

— X2 > * * • } Xn ) ^1 ^2 * ’ ' = 1 


The density function/,• of the random variable X* is obtained by integrating 
over all the other variables, that is, 

•/»<>;) = • • • f_ k f(x u x 2 , . . . , x n ) dx x • • • (hi _! dx i+l ■ ■ ■ dx n 

The Xi, X 2 , . . . , X n are statistically independent if and only if the joint 
density is the product of the individual densities, that is, 

f( x l, &2, . . . , X n ) = /] (m J ^ 2 (m *2) ‘ • • fn($n) 

Now we define a random sample of size n drawn from a population whose 
density function is /. Each member of the sample is considered a random 
variable X it i = 1, 2, . . . , n. The sample is random if and only if the 
following two conditions hold: 

1. The density function of each X, is /. (This states in mathematical 
terminology the requirement that each member of the population has equal 
likelihood of being included in the sample.) 

2. The joint density function g of the X< is the product of their individual 
densities, that is, they are statistically independent, or 

g(x h 32, , Xn) = f(x x )f{x*) • • • f(x n ) 


(This second statement is equivalent to the requirement that the likelihood of 
inclusion of one member is unaffected by the inclusion of any other member, 
that is, that the members are drawn independently.) 

As in the case of a single random variable, the mathematical expectation of a 
function of n random variables is defined as 


E[g(Xi, X 2 , . . . , X tt )] 

555 Itii fn* jn n g ( Xl ’ X2 ‘ • • * 1 Zn)f(x h X 2 , . . . , Xn) II dx % 

in which Ei is the entire range — co to oo. 

Note that if g h g 2) . . . , g k are k functions of the n random variables then 
k k } 

^ ^ 0®') “ ^ E(gd‘ Further, if a and b are constants, 

* -1 i ** 1 


%l(ag(Xh x 2 , . . . , X n ) + 5 ] = aE[g(X 1} X 2} , X„)] + b 
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The student should prove both these statements as exercises. As in the single 
variable case, E is a linear operator. 

We can also define the moment-generating function of a function g of n 
random variables as E[e^ x ^ .A useful theorem is that the moment¬ 

generating function of the sum of n independent random variables is the 
product of their individual moment-generating functions, that is, 


E[e& Xi »] = n E[e x ^} 

Proof 

E\e&xM] == ff • • • f e^ Xi)t f(x i, x 2 , . . . , x n ) dx x dx 2 • • * dx n 

= //••• fUe Xlt fi(xi) dxi = n fe^fiixi) dxi = UE[e x ^] 

B.16. Expectation of Sample Moments About the Origin. Consider draw¬ 
ing a random sample X u X 2 , . . . A n from a population with density func¬ 
tion/. Let g be any continuous function; then 

n 

E [ V (/(X,)] = nE[g(X)} 


Proof . This is merely a special case of the theorem stated in Sec. B.15, that 
k _ k 

E ^ Qi J = ^ [E(gi)]. In this special case k = n and 


i = 1 


gi(X u X 2 , ... , X n ) = cig(Xi) + c 2 g(X 2 ) + • * • + OjgiXf) + 

+ c n g(X n ) 


with a = 1 and c,- = 0 for all j ^ i. We therefore have 


n n 

E [ £ <KX 4 )] = £ E[g (X,)] 

i = 1 i = 1 


Remembering that in a random sample the density function of each Xi is/, the 
density function of a member X picked at random from the population, then 

n 

E[g(X { )] = E[g(X)}; therefore £ E[g(X i )} = nE[g(X)}. 

i~ 1 

Corollary 

I = I n E(X k ) = E(X k ) 

Inin 
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that is, the expected (mean) value of a sample moment about the origin is the 
corresponding population moment. This is what is meant by saying that the 
sample moments about the origin are unbiased estimates of the corresponding 
population moments. It does not follow that sample moments about the 

TT tT U ? b i lased estlmates of moments about the population mean; 
m fact ; the latter statement is not true. 

an< f Variance of the Sum of Independent Random Variables. 

Consider the sum of « independent random variables, Xj, X 2 X with 

Xi drawn from a population with mean ^ and variance <7?. The mean Value 
(expectation) of the sum S is the sum of the population means and the variance 
ot £ is the sum of the population variances, that is, 



L4 

i — 1 


Pi-oof. The first statement is a special case of the theorem 

& k 

E [ 2 = 2 E{gd 

i=l i~=\ 


with k-n and g { = £ c ; X, with c< = 1 and c, = 0 for all j * i. That is. 

) = 1 

E(XXi) = 2 E(Xf) = 2 fxi. To prove the second we expand as follows: 

7i[*S — E(S)] 2 = E{S 2 — 2 SE(S) + [X(,S')] 2 } = E(S 2 ) — [X(»S)] 2 

- ( 2 + 2 - 2 E(X1> + 2 E(A '- VJ - 2^-2 w, 

As X,- and X, (with j 7 ^ i) are independent, 

EiX.Xi) = f j'x i Xjf(x it x l ) dxi d Xj = fziMxi) dxijxjfXxj) dx, 

= X(X ; )X(X,) = w, 

We are left with 2 E(X 2 ) - 2= 2[X(X?) - M 2 ] = 2 < 7 2 

Y B y 8 ' ^ L T ? f Large Numbers - . Consider 1 drawing a random sample 
A h / 2 ’ ' ■ ■ ■-•X" from a Population with density function / and mean n and 
vanance <7 . en u m = a 2 /n, that is, the variance of the sample mean m is 
the population variance divided by the size of the sample. 
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= E[m - E( wOl 2 = E( m - pY = E M ) 

= — e ry (*»• — Y) 1 + y ( x * _ p-)( x i _ **d 

v? U. 

= i y E{x t -hy+- 2 7 E[{xi - ^ - m)] 

n 2 U n 

= - 7«T 2 + y 2C^’l(z i - /*)(*> - **)] 

?i 2 rc- 


$[(«; - OOi - *01 = //(*-■ - OOi " M)/(^/(%) 

= /(.Vi - ;0/(*i) ~ 0/0;) ^ ' ft 

= E(xi - O-EO; - **) = 0 

We are left with cr 2 /«- As n -► », **/« ^ °; thus > as =/’ m beC " 

an increasingly good estimate of jtt as w increases; in fact, the piobabilit} 
that m will diverge from n by more than a fixed amount approaches 0 as n 
approaches infinity. This is what is meant by saying that m conveiges 
stochastically to n or that m is a consistent estimate of **. _ - 

Actually, we could have proved our theorem by utilizing the pievious 
theorem that the variance of the sum of n independent random variables is the 
sum of the variances and then observing that as m - S/n, <r m - <r./n , t 

details are left to the student. # i f 

B 19. Tchebyshefi’s Inequality. Consider drawing a random sample of 
size n from a population with density function /, mean M , and variance cr . 
Then P(\m - u\ > b) < cr 2 / nb 2 , that is, the probability that the sample mean 
m will diverge from the population mean n by an amount greater than b is ess 

than <r 2 / nb 2 . rpi ^ 

Proof . Let the density function of the sample mean be g. I he 


j (m — 0 dm ~ a2 / n 


as proved previously. Let c be any positive number, and break up the integral 
into three parts: 


(m — fj) 2 g(m) dm + 


/> +(WVLO 

j(X— (c<r / \/it) 


(m - m) 2 I?L n ) dm 


+ f “ (in - iiYg(rn) dm = <r 2 /n 

J fi+icff/s/Ti) 


In the first integral we now replace (m - pY by cW/n. This substitution 
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will reduce the value of this integral, because, in the range of integration, 
m < fi ~ (ccr/Vn) and thus m - jx < - ( ca/\/n ); remembering that c, a[ 
and n are all positive and therefore both sides of this inequality are negative we 
have |m — fi\ > ca/\/n and thus (m — jjl) 2 > c 2 a 2 /n. Similarly, in the third 
integral we replace (m - m) 2 by c 2 a 2 /n; by the same argument it can be shown 
that this substitution reduces the value of this integral. As (m — fx) 2 and 
g(m) are both positive, the value of the second integral must be positive. We 

have, therefore, c 2 a 2 /n F /_ w g(rn) dm + f g(m) dm 1 < c r 2 /n. 

' ^ + (w/v«) J 

f fi~(c<r/Vn) _ 

But _/_ „ dm = prob (m < n - ca/y/n) = P(m - fi < - ca/y/n). 

Similarly, g{m) dm = P(m > n + ca/y/n) = P{m - M > ca/y/n). 

As we are dealing with mutually_exclusive events, the sum of these two prob¬ 
abilities is P(\m -_n\ > ca/y/n). Thus c 2 P(\m - M | > ca/y/n) < 1, or 
P(\m — p\ > caly/n) < 1 /c 2 . Let b = ca/y/n; then 1 /c 2 = a 2 /nb 2 . There¬ 
fore P(\m - n\> b) < a 2 /nb 2 . 

B.20. Expectation of the Sample Variance. Consider drawing a random 
sample Xi, i — 1 , 2 , . . . , n, from a population with density function/, mean 
fx r and variance cr 2 . Let m and s 2 be the sample mean and variance respec¬ 
tively; then E(s 2 ) — a 2 (n — 1 )/n. 


Proof 


r S(W - my 


E(s 2 ) = E = l E [ 2 “ 2mX < + ™ 2 )] 

= n E (X Z * ? _ 2 m^Zi + nm 2 ) = ~ E Xf - 2 nm 2 + nm 2 ) 

= n E (X X i ~ nm *) = \ nE(X 2 ) - E(m 2 ) = E(X 2 ) - E(m 2 ) 

For any random variable F, E[Y — E(Y )] 2 = A(F 2 ) — [E(Y)] 2 . In par- 
ticular, E(X 2 ) = <r 2 + n 2 and E(m 2 ) = ai + & = a 2 /n + M 2 . Therefore, 

■F(s 2 ) = <r 2 + /F —• (j 2 /n — }jl 2 = a 


! <i -v»>-.•(—)■ 


Corollary. P[s 2 a/(n - L)] = cr 2 ; that is, s 2 n/(n - 1 ) is an unbiased esti- 
mate of cr 2 . 

B.21. Properties of Normal Distributions. The function 


/ 0 ) 


1 


\/2ir k 


with k positive is suitable as a density function, that is, f(x) > 0 and 

/_ oo f( x ) dx - 1 
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Proof. Clearly/(a;) > 0 , as l/\/2n k is positive and any power of e is posi¬ 
tive. To prove the second part let y = (x — c)/k, then 

M- /<*> ||| 

We need only prove that J_ ^ e" (y2/2) dy = Unfortunately we can give 

only a partial proof. Take 


[ e ~ (v,/,) d y}* = I- 


dz 


I-S 


e -{V*)(v 2 +* 2 ) dy dz 


The pair of values (y,z) can be considered as a point in the yz plane. Then 
y — f cos 6, z = r sin 6, as shown in Fig. B.21.1 (this is called a transformation 



0 r cos 0 

Fig. B.21.1. Transformation of rectangular coordinates (y,z) to polar coordi¬ 
nates (r,6). 

to polar coordinates). Then y 2 + * 2 = r 2 (cos 2 6 + sin 2 6) = r 2 . We state 
without proof 1 that dy dz = rdrdd. As y and z range from - « to », 
r ranges from 0 to « and d from 0 to 2 tt (radians). Thus the double integral 
becomes 


J ' j re~ (r2/ 2 ) dr dd — j ^ -<r (rV2) J dd = j^ dd = 2 t r 

Therefore j e“ (,;V2) dy = \/2w and dy = 1 

1 In making a joint transformation of two variables the Jacobian is used, that is, 
dy dy 

j dz = dr de dr dO Good discussions of the Jacobian are given by Fry (7) 
J dz dz 

~dr dO 

and Courant (1). 
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The moment-generating function of a distribution with density given by 

— -- - e -(l/2 k 2 )(x~c) 2 j g e ct+ikH s / 2) t 


V / 2t r & 


Proof 

E(e tx ) 


f" 1 _ 

J - oo \/2 t j 


gtXg (l/2/c2) (£ c)2 


_:L_ e ~dp ( * ! ~ 2 “ +c2)+te X <+ ^) + ( cf+ ^) 

oo \/2ir k 


,,kH* 

= 


r _i. 

J - » V27T i 


- 7 r r Ax 2 ~2cx-\-c 2 ~2k 2 tx+2k 2 ct- 3 rkn 2 ) , 

e 2 & 2 ax 


, . f oo 

/ + ~2 


[* 1 

J — » \/27r 




& 


As c -f M is a constant, the function under the integral is in normal form; 
therefore the integral must have the value 1, and we are left with 


Corollary. fx = c. 

Proof 


h 2 t 2 

E(e‘*) = e ci+ ^ 


« = (c + **0 


Letting £ = Owe obtain l(c + 0) = c. 
Corollary, cr 2 = & 2 . 

Proof . To find E(X 2 ) we take 


ci +^- s . 2A dB(6 te ) 


Letting t — 0, we obtain 1(& 2 ) + (c)(c) = & 2 + c 2 . Then 

cr 2 = E{X 2 ) - [£(X)] 2 = fc 2 + c 2 - c 2 = Jc 2 
Thus we can write the normal density function as 


/M - 

v 27T <7 

Note that in the important special case fx = 0, a = 1, E(e tx ) = e~ t2/2 . 

In any normal distribution the amount of probability in the interval fx to 
fx + be depends only upon b, that is, 


f m + btn « f n2 + b<n 2 

/ 4>\Pu*l) = / 00*2,^) 


da; 
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for any jm lf y 2} <ri, i n which 0 means the normal function. 
Proof. Let Y = X ~ Ml ; then 


}{y) =/(*) 


dx 

dy 


—3= e-^V2 

's/ 2ir 


As x ranges from to yi + ba i, y ranges from 0 to b; thus 

{; +k, ^!)&= f 0 b <KO,l)dy 

and the latter integral is obviously independent of the values of pti and <7i. 

Any linear transformation leaves the functional form of a normal distribu¬ 
tion unchanged, that is, if A is distributed with density <£(m, <r 2 ), and if 
Y = aX + b, then Y is distributed with density ^>{ay + 6, a 2 cr 2 ). 


Proof 


f(y) = /<» 


\/ 2ir <r 

_ l_/y—b 

—- - 6 2a 2 

\/27r \a\a 


1 1 

a 


j. fy—b \ 2 


1 


\/27r |a|<r 


0 2 aV 2 ' 


fy- (a/u-H )] 2 


which is in normal form, with c = a/x + 6 and & = |a|<r. 

The sum of n independent normally distributed random variates is itself 
normally distributed with mean equal to the sum of the means and variance 
equal to the sum of the variances, that is, if Xi is distributed according to 
<rf)i = 1 , 2, . . . , n, and if all Xi are independent of each other, then 
XXi is distributed according to 

Proof. We have proved in Sec. B.15 that the moment-generating function 
of the sum of n independent random variables is the product of their individual 
moment-generating functions, that is, E[e { ^ Xi)t \ = HE[e Xti ]. In this case 
E[e Xtt ] = e^ t+aiH2/2 . Thus E[e (ZXi)t ] = e^ fit)t+ ^ <r%2)t2/2 J which is the moment¬ 
generating function for a normal distribution with mean and variance 
2<r?. 

If X is normally distributed with mean }ix and variance and Y is 
normally distributed with mean fj. Y and variance a Y , and if X and Y are inde¬ 
pendent, then X — Y is normally distributed with mean fix — yr and variance 
2.2 
°x + o>. 

Proof. As X and Y are independent, X and Z are also independent, where 
Z = - Y. Then 

^[gU-rx] = E[e^ x+Z)t ] = E(e xt )E(e zt ) = 

= — g(ftx—Hy)t+(<r x 2 +<r y 2 )t 2 / 2 
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which is the moment-generating function of a normal distribution with mean 

fx x — fj,y and variance + < 7 ^. 

B.22. Properties of Chi-square Distributions. In the text chi square with 
n degrees of freedom was defined as the sum of the squares of n independent 
random variates each distributed according to <*>(0,1). It is more customary 
to define chi square as the random variate with the density given by 

f(xt) = 2^7»T(V2) 

in which T(n/ 2 ) is the gamma function, defined by r(jfc) == J °° x k-i e - x ( i x 

We shall use this conventional definition and show that the moment-gener¬ 
ating function is (1 - Let a e= x 2 , then 

E(e ‘') = e“f(z) dz 


Let w = («/2) (1 - 20 ; then 2 = 2 w/(l - 2 f) and ~ = 2/(1 - 2 t); thus 


E{eZ,) = /o” e ~ W(1 - - 20-12 dw 

(1 — 2 f)-(»/ 2 ) f *> 

Fp) Jo e “’ M,< " /2) ” 1 dw= (1- 2 

as the integral is by definition T(n/ 2 ). 

The moments of the xl distribution are obtained from the moment-gener¬ 
ating function as follows: 


<2(1 — 2 t)~ n ' 2 
dt 


n( 1 




Setting t — 0 , we have E(xl) = n. Differentiating again, we have 


d-( 1 - 20-»/ 2 


= n(n + 2)(1 




Setting t = 0 , we have E[( x l) 2 } = n 2 + 2 n; then <r 2 = n 2 + 2 w - n 2 = 2 w. 

The sum of squares of n independent random variates, each distributed 
according to <*>( 0 , 1 ), has the moment-generating function (1 — 2t)~ n/2 and 
therefore has the xl distribution. 

Proof. Let Y = SX|, each being distributed according to <*>(0 1 ) and 
each being independent of the others. Then 

E(e Y ‘) = n E{e x *‘) = [E{e x *‘)Y 
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But E(e^) e f_* x e xl ‘4>(0,l) dx = j , 

j “ g-(*V2)(l-20 


y/2'X 


2) 


a/27T 

Let w s a(l - 20 1/2 , then 

and 


dx 

dw 


= (1 - 20-** 


2£( e xa() = -- j 00 6 _CwV2) dtw = (1 — 20 >/2 

as the integral has the value [or, we could put the denominator under the 
integral and observe that under the integral we would havetiuj funco 
0(0,1), thus making the integral 1]. Thus E(e* 0 = (1 - 2 1) , and there- 

fore Y is distributed as xl- 9 , 2 . OCJ 

If xl and Xn are distributed independently, then x™ + Xn is distributed as 

2 

Xm-p»* 

Proof 

jjj\ e (xm 2 +xn 2 )t] = E{e*™ l )E{e XnH ) = (1 — 20~ w/2 (l "" 20 n/2 = Q ~~ 2 0 

which is the moment-generating function for x m +n* 

B.23. Distribution of t n . If X is normally distributed with zero mean 
and unit variance and Y is independently distributed as *», then the distnbu- 

X . . , 

tion of t n = —rSn - « given b >' 


■s/Y/n 

f(t ) = — r(!L±i) P + l) 

* B VWr(n/2) V 2 An 
\ 


— (n+l)/2 


Proof 


f(x,y) = f(x)f(v) - 


271 


-(t 2 /2) 


2^f(n72) 


y(n/2)-l e ~v/2 


y 


f (n/2)—lg—0 2 +y)/2 


Let t = 


Then 


Then 


V2tt 2 ” /2 r(n/ 2 ) 

* be a change of variable, holding y fixed; also let 

1 


V?// 




V 2 ir 2 n/ 2 T(n/ 2 ) 


/(t,?y) =f(x,y) 


C y (n-l)/2g- (/ 2 y+ny )/2n n -}6 


/(.■)= f /«,») % = cn-» f 0 ” y^H-^^dy 

J Ry J 
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Now let w ss ( t 2 y + ny)/2n, holding t fixed; then 

/(0 = J Q ” dw 

( f2 \ —(n+l)/2 f oo 

- + 1 J / W (n + l)/2~l e ~w ^ 

“ (1* + i^ _(nfl)/2 r (l+ll) 

= _ 1 r (t + ! V (ri+1)/2 

\Awr T(n/2) \ 2 / \n / 


B.24. Distribution of F m , n . If X and Y are independently distributed 


xh and Xn respectively, then the distribution of F m , n 


X/m . . 


7/n 


is given by 


r + n \ 

IV:.) - (=) W '^"‘ 0 + =") 


— (w+n)/2 


(?) r © 


Proof 


f(x,y) = J{x)f{y) = 


l 


Let 


2 m/2 r (m/2) 2 w/2 r (n/ 2) 
1 


m _ j Vi — i 

X 2 2/ 2 g~(*+y)/2 


and w = 


dw 


~ 2 m / s r (m/2)2 n/2 L (n/2) 

with y fixed. Then/(w>,?/) =f(x,y) 
and 

/. w. rw + n „ 

r * —— i r oo . — 1 

/(w) = / f(w,y) dy = cw 2 J y 2 e -(y/ 2 )< w +n ^ 

Lettings = (?//2)(w + 1), with w fixed, then 

?n + ft 


/(w) ~ cr^ 2 (w> + l)-(»»+»)/22(»»+r0/2 z 2 

= c2^ f ^/‘ 2 r (!!L+_?) + 




Let 


Then 


/(^*.») = /(w) 


F m , 

r (=©-“) 

(?M?) 


x/m n 
___ = — w 
2 //n m 


er^'o+rO' 


(m-f-n)/2 


as 
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B.25. The Linear Mean Regression Line. Let X and Y be jointly dis¬ 
tributed random variates; then to determine a linear function Y p = aX + b 

such that E(Y — Y v ) 2 is a minimum we set u = p xy — and 

' g x 

]&' . _ / 

7 G y S 

' r -. _- ® — jA y Pxy _ px 

G x 


E(Y - Y p y = E(Y 2 - 2 YY P + Y\) = E(Y 2 ) - 2 E(YY P ) + E(Y 2 P ) 

= ®( 7 2 ) - 2E(aXY + 67) + E{a 2 X 2 + 2 abX + 6 2 ) 

= £?(7 2 ) - 2aE(XY) - 2bE(Y) + o2 E(X 2 ) + 2abE(X) + b 2 

Differentiating with respect to a, holding b fixed, we obtain 


dE{Y - Y p ) 2 


= -2 E(XY) + 2aE(X) 2 + 2bE(X) 


Differentiating with respect to b, holding a fixed, we obtain 


dE(Y - Yp) 2 
db 


-2 E{Y) + 2aE(X) + 2b 


Setting both derivatives equal to 0 and solving simultaneously, we obtain 

E(XY) - E(X)E(Y) <?y 
a E(X 2 ) -[E(X)] 2 pxy <r x 


b = E(Y) - aE(X) = p,- Pxy ^ Mx 

G x 

As both second derivatives are positive, we conclude that a and b minimize 
E(Y — Y p ) 2 . An exactly equivalent proof holds for a sample, that is, with 
all parameters replaced by statistics. 

B.26. Bivariate Normal Distributions. We write bivariate normal dis¬ 
tributions in the following form: 

/(x,,) = — 

J J 2tc<t x <t v \/l — P 2 


/(*, V) dy 


x - p x . 

- and 


and = 


■e 2 (i- P 2 ) 


(w 2 — 2pwz-\-z 2 ) 
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Note that 


w 2 - 2pwz + 3 2 - z 2 — 2 pwz + P 2 W 2 + W 2 — p V 
Thus the integral becomes 


(z - pw) 2 + w 2 ( 1 - p 2 ) 


/- 


\/2ir \/l — p 2 


e 2 (i- P 2 ) (z pw)! 2 & 


_ W 2 

e ~2 


a 


\/2w \/l — p 


e 2 (i -p*) (s dz 


As w is treated as a constant, the value of the latter integral is 1, as it is the 
integral of a normal density function with variance (1 — p 2 ) and mean pw. 
Thus 

1 w’ 2 1 l 

fix) = = — F =— e“w (a? -^ )s 

V 2w <T X V 27T O'* 


which is a normal density function. By symmetry, 


/(y) = — 7 =— e 2a u f u 

Note that 

f(x\y) = 


which is the normal density with variance o^i 1 — p 2 ) and mean 

cr x 

Px + p~ (y— p y ) 

° y 

This shows that all conditional distributions are normal and further that the 
mean is a linear function of y, that is, that the mean regression of X on Y is the 
straight line 

m x\y — p ~~ (y ~ Py) + Px 

V y 

The variance of each conditional distribution is o^( 1 - p 2 ) and thus does 
not depend upon the value of Y . Therefore, the conditional distributions of 
bivariate normal distributions are homoscedastic. 


f(x,y)/f(y) = - 7 ==- 


V V y 

1 


\Z2tt <T X \/\ — P 2 
1 1 


e 2(1 - p S)^ 2 “ 2 ^+ a,2 >+^ zS 


\/ 27T <r a 

; Vl - P 2 


1 

\/ 2tt cr z 

1 

> 


1 

x/2w or x 

Vi - p 2 


1 


e~W : X>*) [w2 ~ 2pwz+ * 2 ~ ( ' 1 ~~p* )z2] 

i 

(w — pz)* 


e 2 (l-p 2 ) v 

_ L T o-ar, . "1 2 


s/2tt a x a/ 1 — p 2 


2(1 -p*)<rA X + ( 3 /“/**)] | 


; 


\ C\ V 

■ i y : 
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Let f(x,y) be any constant K ; then 


- 1 = e ~W^)^~ 29WS+zi) 

2wa x (jy \/l — p 2 


Therefore, w 2 — 2 pwz + z 2 = C, in which C is some positive constant. Then 


\ (x - Mx) 2 - — (x - ju*)o - Hv) + K (v - M*/) 2 = C 


Considering this a quadratic in (x — p x ) and (y — p y ) and applying the dis¬ 
criminant, we have 

V 4 4r(p 2 - 1) . n 


b 2 — 4ac = —2 2 "“22“ 2 2 

<r a5 <r i/ a sPv 


2 2 — ^ 


Therefore, unless p 2 = 1, the curves of equal probability are ellipses in the 
(x — p x )(y — Pv) plane, and therefore in the xy plane. 


I 


Appendix C 

MISCELLANEOUS TABLES 
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Table C.l. Squares and Square Roots of Numbers from 
1 to 1,000* _ 


Number 

Square 

Square root 

Number 

Square 

Square root 

1 

1 

1.0000 

41 

16 81 

6.4031 

2 

4 

1.4142 

42 

17 64 

6.4807 

3 

9 

1.7321 

43 

18 49 

6.5574 

4 

16 

2.0000 

44 

19 36 

6.6332 

5 

25 

2.2361 

45 

20 25 

6.7082 

6 

36 

2.4495 

46 

21 16 

6.7823 

7 

49 

2.6458 

47 

22 09 

6.8557 

8 

64 

2.8284 

48 

23 04 

6.9282 

9 

81 

3.0000 

49 

24 01 

7.0000 

10 

100 

3.1623 

50 

25 00 

7.0711 

11 

121 

3.3166 

51 

26 01 

7.1414 

12 

144 

3.4641 

52 

27 04 

7.2111 

13 

169 

3.6056 

53 

28 09 

7.2801 

14 

196 

3.7417 

54 

29 16 

7.3485 

15 

2 25 

3.8730 

55 

30 25 

7.4162 

16 

2 56 

4.0000 

56 

31 36 

7.4833 

17 

2 89 

4.1231 

57 

32 49 

7.5498 

18 

3 24 

4.2426 

58 

33 64 

7.6158 

19 

3 61 

4.3589 

59 

34 81 

7.6811 

20 

4 00 

4.4721 

60 

36 00 

7.7460 

21 

4 41 

4.5826 

61 

37 21 

7.8102 

22 

4 84 

4.6904 

62 

38 44 

7.8740 

23 

5 29 

4.7958 

63 

39 69 

7.9373 

24 

5 76 

4.8990 

64 

40 96 

8.0000 

25 

6 25 

5.0000 

65 

42 25 

8.0623 

26 

6 76 

5.0990 

66 

43 56 

8.1240 

27 

7 29 

5.1962 

67 

44 89 

8.1854 

28 

7 84 

5.2915 

68 

46 24 

8.2462 

29 

8 41 

5.3852 

69 

47 61 

8.3066 

30 

9 00 

5.4772 

70 

49 00 

8.3666 

31 

9 61 

5.5678 

71 

50 41 

8.4261 

32 

10 24 

5.6569 

72 

51 84 

8.4853 

33 

10 89 

5.7446 

73 

53 29 

8.5440 

34 

11 56 

5.8310 

74 

54 76 

8.6023 

35 

12 25 

5.9161 

75 

56 25 

8.6603 

36 

12 96 

6.0000 

76 

57 76 

8.7178 

37 

13 69 

6.0828 

77 

59 29 

8.7750 

38 

14 44 

6.1644 

78 

60 84 

8.8318 

39 

15 21 

6.2450 

79 

62 41 

8.8882 

40 

16 00 

6.3246 

80 

64 00 

8.9443 


* By permission from Statistics for students of psychology and education , by H. 
Sorenson. Copyright 1936, McGraw-Hill Book Company, Inc. 
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Table C.l. Squares and Square Roots of Numbers from 


Number 

Square 

Square root 

Number 

Square 

Square root 

81 

65 61 

9.0000 

121 

1 46 41 

11.0000 

82 

67 24 

9.0554 

122 

1 48 84 

11.0454 

83 

68 89 

9.1104 

123 

1 51 29 

11.0905 

84 

70 56 

9.1652 

124 

1 53 76 

11.1355 

85 

72 25 

9.2195 

125 

1 56 25 

11.1803 

86 

73 96 

9.2736 

126 

1 58 76 

11.2250 

87 

75 69 

9.3274 

127 

1 61 29 

11.2694 

88 

77 44 

9.3808 

128 

1 63 84 

11.3137 

89 

79 21 

9.4340 

129 

1 66 41 

11.3578 

90 

81 00 

9.4868 

130 

1 69 00 

11.4018 

91 

82 81 

9.5394 

131 

1 71 61 

11.4455 

92 

84 64 

9.5917 

132 

1 74 24 

11.4891 

93 

86 49 

9.6437 

133 

1 76 89 

11.5326 

94 

88 36 

9.6954 

134 

1 79 56 

11.5758 

95 

90 25 

9.7468 

135 

1 82 25 

11.6190 

96 

92 16 

9.7980 

136 

1 84 96 

11.6619 

97 

94 09 

9.8489 

137 

1 87 69 

11.7047 

98 

96 04 

9.8995 

138 

1 90 44 

11.7473 

99 

98 01 

9.9499 

139 

1 93 21 

11 7898 

100 

1 00 00 

10.0000 

140 

1 96 00 

11.8322 

101 

1 02 01 

10.0499 

141 

1 98 81 

11.8743 

102 

104 04 

10.0995 

142 

2 0164 

11.9164 

103 

1 06 09 

10.1489 

143 

2 04 49 

11.9583 

104 

1 08 16 

10.1980 

144 

2 07 36 

12.0000 

105 

1 10 25 

10.2470 

145 

2 10 25 

12.0416 

106 

1 12 36 

10.2956 

146 

2 13 16 

12.0830 

107 

1 14 49 

10.3441 

147 

2 16 09 

12.1244 

108 

1 16 64 

10.3923 

148 

2 19 04 

12.1655 

109 

1 18 81 

10.4403 

149 

2 22 01 

12.2066 

110 

1 21 00 

10.4881 

150 

2 25 00 

12.2474 

111 

1 23 21 

10.5357 

151 

2 28 01 

S 

12.2882 

112 

1 25 44 

10.5830 

152 

2 3104 

12.3288 

113 

1 27 69 

10.6301 

153 

2 34 09 

12.3693 

114 

1 29 96 

10.6771 

154 

2 37 16 

12.4097 

115 

1 32 25 

10.7238 

155 

2 40 25 

12.4499 

116 

1 34 56 

10.7703 

156 

2 43 36 

12.4900 

117 

1 36 89 

10.8167 

157 

2 46 49 

12.5300 

118 

1 39 24 

10.8628 

158 

2 49 64 

12.5698 

119 

1 41 61 

10.9087 

159 

2 52 81 

12.6095 

120 

* -Q„ ___ 

144 00 

10.9545 

160 

2 56 00 

12.6491 
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Table C.l. Squares and Square Roots of Numbers from 
1 to 1,000* ( Continued ) 


Number 

Square 

Square root 

Number 

Square 

Square root 

161 

2 59 21 

12.6886 

201 

4 04 01 

14.1774 

162 

2 62 44 

12.7279 

202 

4 08 04 

14.2127 

163 

2 65 69 

12.7671 

203 

412 09 

14.2478 

164 

2 68 96 

12.8062 

204 

416 16 

14.2829 

165 

2 72 25 

12.8452 

205 

4 20 25 

14.3178 

166 

2 75 56 

12.8841 

206 

4 24 36 

14.3527 

167 

2 78 89 

12.9228 

207 

4 28 49 

14.3875 

168 

2 82 24 

12.9615 

208 

4 32 64 

14.4222 

169 

2 85 61 

13.0000 

209 

4 36 81 

14.4568 

170 

2 89 00 

13.0384 

210 

4 41 00 

14.4914 

171 

2 92 41 

13.0767 

211 

4 45 21 

14.5258 

172 

2 95 84 

13.1149 

212 

4 49 44 

14.5602 

173 

2 99 29 

13.1529 

213 

4 53 69 

14.5945 

174 

3 02 76 

13.1909 

214 

4 57 96 

14.6287 

175 

3 06 25 

13.2288 

215 

4 62 25 

14.6629 

176 

3 09 76 

13.2665 

216 

4 66 56 

14.6969 

177 

3 13 29 

13.3041 

217 

4 70 89 

14.7309 

178 

3 16 84 

13.3417 

218 

4 75 24 

14.7648 

179 

3 20 41 

13.3791 

219 

4 79 61 

14.7986 

180 

3 24 00 

13.4164 

220 

4 84 00 

14.8324 

181 

3 27 61 

13.4536 

221 

4 88 41 

14.8661 

182 

3 3124 

13.4907 

222 

4 92 84 

14.8997 

183 

3 34 89 

13.5277 

223 

4 97 29 

14.9332 

184 

3 38 56 

13.5647 

224 

5 01 76 

14.9666 

185 

3 42 25 

13.6015 

225 

5 06 25 

15.0000 

186 

3 45 96 

13.6382 

226 

5 10 76 

15.0333 

187 

3 49 69 

13.6748 

227 

5 15 29 

15.0665 

188 

3 53 44 

13.7113 

228 

5 19 84 

15.0997 

189 

3 57 21 

13.7477 

229 

5 24 41 

15.1327 

190 

- 3 6100 

13.7840 

230 

5 29 00 

15.1658 

191 

3 64 81 

13.8203 

231 

5 33 61 

15.1987 

192 

3 68 64 

13.8564 

232 

5 38 24 

15.2315 

193 

3 72 49 

13.8924 

233 

5 42 89 

15.2643 

194 

3 76 36 

13.9284 

234 

5 47 56 

15.2971 

195 

3 80 25 

13.9642 

235 

5 52 25 

15.3297 

196 

3 84 16 

14.0000 

236 

5 56 96 

15.3623 

197 

3 88 09 

14.0357 

237 

5 61 69 

15.3948 

198 

3 92 04 

14.0712 

238 

5 66 44 

15.4272 

199 

3 96 01 

14.1067 

239 

5 71 21 

15.4596 

200 

4 00 00 

14.1421 

240 

5 76 00 

15.4919 


x>y permissiLui nuiu - - -v * 
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Table C.l. Squares and Square Roots of Numbers from 
1 to 1,000* ( Continued ) 


Number 

Square 

Square root 

Number 

Square 

Square root 

241 

5 80 81 

15.5242 

281 

7 89 61 

16.7631 

242 

5 85 64 

15.5563 

282 

7 95 24 

16.7929 

243 

5 90 49 

15.5885 

283 

8 00 89 

16.8226 

244 

5 95 36 

15.6205 

284 

8 06 56 

16.8523 

245 

6 00 25 

15.6525 

285 

8 12 25 

16.8819 

246 

6 05 16 

15.6844 

286 

8 17 96 

16.9115 

247 

6 10 09 

15.7162 

287 

8 23 69 

16.9411 

248 

615 04 

15.7480 

288 

8 29 44 

16.9706 

249 

6 20 01 

15.7797 

289 

8 35 21 

17.0000 

250 

6 25 00 

15.8114 

290 

8 4100 

17.0294 

251 

6 30 01 

15.8430 

291 

8 46 81 

17.0587 

252 

6 35 04 

15.8745 

292 

8 52 64 

17.0880 

253 

6 40 09 

15.9060 

293 

8 58 49 

17.1172 

254 

6 45 16 

15.9374 

294 

8 64 36 

17.1464 

255 

6 50 25 

15.9687 

295 

8 70 25 

17.1756 

256 

6 55 36 

16.0000 

296 

8 76 16 

17.2047 

257 

6 60 49 

16.0312 

297 

8 82 09 

17.2337 

258 

6 65 64 

16.0624 

298 

8 88 04 

17.2627 

259 

6 70 81 

16.0935 

299 

8 94 01 

17.2916 

, 260 

6 76 00 

16.1245 

300 

9 00 00 

17.3205 

261 

6 81 21 

16.1555 

301 

9 06 01 

17.3494 

262 

6 86 44 

16.1864 

302 

9 12 04 

17.3781 

263 

6 91 69 

16.2173 

303 

918 09 

17.4069 

264 

6 96 96 

16.2481 

304 

9 24 16 

17.4356 

265 

7 02 25 

16.2788 

305 ! 

9 30 25 

17.4642 

266 

7 07 56 

16.3095 

306 

9 36 36 

17.4929 

267 

7 12 89 

16.3401 

307 

9 42 49 

17.5214 

268 

7 18 24 

16.3707 

308 

9 48 64 

17.5499 

269 

7 23 61 

16.4012 

309 

9 54 81 

17.5784 

270 

7 29 00 

16.4317 

310 

9 6100 

17.6068 

271 

7 34 41 

16.4621 

311 

9 67 21 

17.6352 

272 

7 39 84 

16.4924 

312 

9 73 44 

17.6635 

273 

7 45 29 

16.5227 

313 

9 79 69 

17.6918 

274 

7 50 76 

16.5529 

314 

9 85 96 

17.7200 

275 

7 56 25 

16.5831 

315 

9 92 25 

17.7482 

276 

7 61 76 

16.6132 

316 

9 98 56 

17.7764 

277 

7 67 29 

16.6433 

317 

10 04 89 

17.8045 

278 

7 72 84 

16.6733 

318 

10 11 24 

17.8326 

279 

7 78 41 

16.7033 

319 

10 17 61 

17.8606 

280 

7 84 00 

16.7332 

320 

10 24 00 

17.8885 


* By permission from Statistics for students of psychology and education } by H. 
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Table C.l. Squares and Square Roots of Numbers from 
1 to 1,000* ( Continued) 


Number 

Square 

Square root 

Number 

Square 

Square root 

321 

10 30 41 

17.9165 

361 

13 03 21 

19.0000 

322 

10 36 84 

17.9444 

362 

13 10 44 

19.0263 

323 

10 43 29 

17.9722 

363 

13 17 69 

19.0526 

324 

10 49 76 

18.0000 

364 

13 24 96 

19.0788 

325 

10 56 25 

18.0278 

365 

13 32 25 

19.1050 

326 

10 62 76 

18.0555 

366 

13 39 56 

19.1311 

327 

10 69 29 

18.0831 

367 

13 46 89 

19.1572 

328 

10 75 84 

18.1108 

368 

13 54 24 

19.1833 

329 

10 82 41 

18.1384 

369 

13 61 61 

19.2094 

330 

10 89 00 

18.1659 

370 

13 69 00 

19.2354 

331 

10 95 61 

18.1934 

371 

13 76 41 

19.2614 

332 

11 02 24 

18.2209 

372 

13 83 84 

19.2873 

333 

11 08 89 

18.2483 

373 

13 91 29 

19.3132 

334 

11 15 56 

18.2757 

374 

13 98 76 

19.3391 

335 

11 22 25 

18.3030 

375 

14 06 25 

19.3649 

336 

11 28 96 

18.3303 

376 

14 13 76 

19.3907 

337 

11 35 69 

18.3576 

377 

14 21 29 

19.4165 

338 

11 42 44 

18.3848 

378 

14 28 84 

19.4422 

339 

11 49 21 

18.4120 

379 

14 36 41 

19.4679 

340 

11 56 00 

18.4391 

380 

14 44 00 

19.4936 

341 

11 62 81 

18.4662 

381 

14 51 61 

19.5192 

342 

11 69 64 

18.4932 

382 

14 59 24 

19.5448 

343 

11 76 49 

18.5203 

383 

14 66 89 

19.5704 

344 

11 83 36 

18.5472 

384 

14 74 56 

19.5959 

345 

11 90 25 

18.5742 

385 

14 82 25 

19.6214 

346 

11 97 16 

18.6011 

386 

14 89 96 

19.6469 

347 

12 04 09 

18.6279 

387 

14 97 69 

19.6723 

348 

12 11 04 

18.6548 

388 

15 05 44 

19.6977 

349 

12 18 01 

18.6815 

389 

15 13 21 

19.7231 

350 

12 25 00 

18.7083 

390 

15 2100 

19.7484 

351 

12 32 01 

18.7350 

391 

15 28 81 

19.7737 

352 

12 39 04 

18.7617 

392 

15 36 64 

19.7990 

353 

12 46 09 

18.7883 

393 

15 44 49 

19.8242 

354 

12 53 16 

18.8149 

394 

15 52 36 

19.8494 

355 

12 60 25 

18.8414 

395 

15 60 25 

19.8746 

356 

12 67 36 

18.8680 

396 

15 68 16 

19.8997 

357 

12 74 49 

18.8944 

397 

15 76 09 

19.9249 

358 

12 81 64 

18.9209 

398 

15 84 04 

19.9499 

359 

12 88 81 

18.9473 

399 

15 92 01 

" 19.9750 

360 

12 96 00 

18.9737 

400 

16 00 00 

20.0000 


* By permission from Statistics for students of psychology and education, by H. 
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Table C.l. Squares and Square Roots of Numbers from 


1 to 1,000* ( Continued) 


Number 

Square 

Square root 

401 

16 08 01 

20.0250 

402 

16 16 04 

20.0499 

403 

16 24 09 

20.0749 

404 

16 32 16 

20.0998 

40S 

16 40 25 

20.1246 

406 

16 48 36 

20.1494 

407 

16 56 49 

20.1742 

408 

16 64 64 

20.1990 

409 

16 72 81 

20.2237 

410 

16 81 00 

20.2485 

411 

16 89 21 

20.2731 

412 

16 97 44 

20.2978 

413 

17 05 69 

20.3224 

414 

17 13 96 

20.3470 

415 

17 22 25 

20.3715 

416 

17 30 56 

20.3961 

417 

17 38 89 

20.4206 

418 

17 47 24 

20.4450 

419 

17 55 61 

20.4695 

420 

17 64 00 

20.4939 

421 

17 72 41 

20.5183 

422 

17 80 84 

20.5426 

423 

17 89 29 

20.5670 

424 

17 97 76 

20.5913 

425 

18 06 25 

20.6155 

426 

18 14 76 

20.6398 

427 

18 23 29 

20.6640 

428 

18 31 84 

20.6882 

429 

18 40 41 

20.7123 

430 

18 49 00 

20.7364 

431 

18 57 61 

20.7605 

432 

18 66 24 

20.7846 

433 

18 74 89 

20.8087 

434 

18 83 56 

20.8327 

435 

18 92 25 

20.8567 

436 

19 00 96 

20.8806 

437 

19 09 69 

20.9045 

438 

19 18 44 

20.9284 

439 

19 27 21 

20.9523 

440 

19 36 00 

20.9762 


Number 

Square 

Square root 

441 

19 44 81 

21.0000 

442 

19 53 64 

21.0238 

443 

19 62 49 

' 2 1.0476 

444 

19 71 36 

21.0713 

445 

19 80 25 

21.0950 

446 

19 89 16 

21.1187 

447 

19 98 09 

21.1424 

448 

20 07 04 

21.1660 

449 

20 16 01 

21.1896 

450 

20 25 00 

21.2132 

451 

20 34 01 

21.2368 

452 

20 43 04 

21.2603 

453 

20 52 09 

21.2838 

454 

20 61 16 

21.3073 

455 

20 70 25 

21.3307 

456 

20 79 36 

21.3542 

457 

20 88 49 

21.3776 

458 

20 97 64 

21.4009 

459 

21 06 81 

21.4243 

460 

21 16 00 

21.4476 

461 

21 25 21 

21.4709 

462 

21 34 44 

21.4942 

463 

21 43 69 

21.5174 

464 

21 52 96 

21.5407 

465 

21 62 25 

21.5639 

466 

21 71 56 

21.5870 

467 

21 80 89 

21.6102 

468 

21 90 24 

21.6333 

469 

21 99 61 

21.6564 

470 

22 09 00 

21.6795 

471 

22 18 41 

21.7025 

472 

22 27 84 

21.7256 

473 

22 37 29 

21.7486 

474 

22 46 76 

21.7715 

475 

22 56 25 

21.7945 

476 

22 65 76 

21.8174 

477 

22 75 29 

21.8403 

478 

22 84 84 

21.8632 

479 

22 94 41 

21.8861 

480 

23 04 00 

21.9089 


By permission from Statistics for students of psychology and education. by H 
Sorenson. Copyright 1936, McGraw-Hill Book Company, Inc. 
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Table C.l. Squares and Square Roots of Numbers from 
1 to 1,000* ( Continued) 


Number 


Square 


Square root 


Number 


561 

562 

563 

564 

565 

566 

567 

568 

569 

570 


31 47 21 
31 58 44 
31 69 69 
31 80 96 

31 92 25 

32 03 56 
32 14 89 
32 26 24 
32 37 61 
32 49 00 


23.6854 
23.7065 
23.7276 
23.7487 
23.7697 
23.7908 
23.8118 
23.8328 
23.8537 
23.8747 


601 

602 

603 

604 

605 

606 

607 

608 

609 

610 


Square 


Square root 


36 12 01 
36 24 04 
36 36 09 
36 48 16 
36 60 25 
36 72 36 
36 84 49 

36 96 64 

37 08 81 
37 21 00 


24.5153 

24.5357 

24.5561 

24.5764 

24.5967 

24.6171 

24.6374 

24.6577 

24.6779 

24.6982 


571 

572 

573 

574 

575 

576 

577 

578 

579 

580 


32 60 41 
32 71 84 
32 83 29 

32 94 76 

33 06 25 
33 17 76 
33 29 29 
33 40 84 
33 52 41 
33 64 00 


23.8956 

23.9165 

23.9374 

23.9583 

23.9792 

24.0000 

24.0208 

24.0416 

24.0624 

24.0832 


611 

612 

613 

614 

615 

616 

617 

618 

619 

620 


37 33 21 
37 45 44 
37 57 69 
37 69 96 
37 82 25 

37 94 56 

38 06 89 
38 19 24 
38 31 61 
38 44 00 


24.7184 

24.7385 

24.7588 

24.7790 

24.7992 

24.8193 

24.8395 

24.8596 

24.8797 

24.8998 


581 

582 

583 

584 

585 

586 

587 

588 

589 

590 


33 75 61 
33 87 24 

33 98 89 

34 10 56 
34 22 25 
34 33 96 
34 45 69 
34 57 44 
34 69 21 
34 81 00 


24.1039 

24.1247 

24.1454 

24.1661 

24.186^ 

24.2074 

24.2281 

24.2487 

24.2693 

24.2899 


621 

622 

623 

624 

625 

626 

627 

628 

629 

630 


38 56 41 
38 68 84 
38 81 29 

38 93 76 

39 06 25 
39 18 76 
39 31 29 
39 43 84 
39 56 41 
39 69 00 


24.9199 

24.9399 

24.9600 

24.9800 

25.0000 

25.0200 

25.0400 

25.0599 

25.0799 

25.0998 


591 

592 

593 

594 

595 

596 

597 

598 

599 

600 


34 92 81 

35 04 64 
35 16 49 
35 28 36 
35 40 25 
35 52 16 
35 64 09 
35 76 04 

35 88 01 

36 00 00 


24.3105 

24.3311 

24.3516 

24.3721 

24.3926 

24.4131 

24.4336 

24.4540 

24.4745 

24.4949 


631 

632 

633 

634 

635 

636 

637 

638 

639 

640 


39 81 61 

39 94 24 

40 06 89 
40 19 56 
40 32 25 
40 44 96 
40 57 69 
40 70 44 
40 83 21 
40 96 00 


25.1197 

25.1396 

25.1595 

25.1794 

25.1992 

25.2190 

25.2389 

25.2587 

25.2784 

25.2982 


* By permission from Statistics for students of psychology and education, by H 
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Table C.l. Squares and Square Roots of Numbers from 
1 to 1,000* ( Continued) 


Number 

Square 

Square root 

Number 

Square 

Square root 

641 

4108 81 

25.3180 

681 

46 37 61 

26.0960 

642 

41 21 64 

25.3377 

682 

46 51 24 

26.1151 

643 

41 34 49 

25.3574 

683 

46 64 89 

26.1343 

644 

41 47 36 

25.3772 

684 

46 78 56 

26.1534 

645 

41 60 25 

25.3969 

685 

46 92 25 

26.1725 

646 

41 73 16 

25.4165 

686 

47 05 96 

26.1916 

647 

41 86 09 

25.4362 

687 

47 19 69 

26.2107 

648 

41 99 04 

25.4558 

688 

47 33 44 

26.2298 

649 

42 12 01 

25.4755 

689 

47 47 21 

26.2488 

650 

42 25 00 

25.4951 

690 

47 61 00 

26.2679 

651 

42 38 01 

25.5147 

691 

47 74 81 

26.2869 

652 

42 51 04 

25.5343 

692 

47 88 64 

26.3059 

653 

42 64 09 

25.5539 

693 

48 02 49 

26.3249 

654 

42 77 16 

25.5734 

694 

48 16 36 

26.3439 

655 

42 90 25 

25.5930 

695 

48 30 25 

26.3629 

656 

43 03 36 

25.6125 

696 

48 44 16 

26.3818 

657 

43 16 49 

25.6320 

697 

48 58 09 

26.4008 

658 

43 29 64 

25.6515 

698 

48 72 04 

26.4197 

659 

43 42 81 

25.6710 

699 

48 86 01 

26.4386 

660 

43 56 00 

25.6905 

700 

49 00 00 

26.4575 

661 

43 69 21 

25.7099 

701 

49 14 01 

26.4764 

662 

43 82 44 

25.7294 

702 

49 28 04 

26.4953 

663 

43 95 69 

25.7488 

703 

49 42 09 

26.5141 

664 

44 08 96 

25.7682 

704 

49 56 16 

26.5330 

665 

44 22 25 

25.7876 

705 

49 70 25 

26.5518 

666 

44 35 56 

25.8070 

706 

49 84 36 

26.5707 

667 

44 48 89 

25.8263 

707 

49 98 49 

26.5895 

668 

44 62 24 

25.8457 

708 

50 12 64 

26.6083 

669 

44 75 61 

25.8650 

709 

50 26 81 

26.6271 

670 

44 89 00 

25.8844 

710 

50 41 00 

26.6458 

671 

45 02 41 

25.9037 

711 

50 55 21 

26.6646 

672 

45 15 84 

25.9230 

712 

50 69 44 

26.6833 

673 

45 29 29 

25.9422 

713 

50 83 69 

26.7021 

674 

45 42 76 

25.9615 

714 

50 97 96 

26.7208 

675 

45 56 25 

25.9808 

715 

51 12 25 

26.7395 

676 

45 69 76 

26.0000 

716 

51 26 56 

26.7582 

677 

45 83 29 

26.0192 

717 

51 40 89 

26.7769 

678 

45 96 84 

26.0384 

718 

51 55 24 

26.7955 

679 

46 10 41 

26.0576 

719 

51 69 61 

26.8142 

680 

46 24 00 

26.0768 

720 

51 84 00 

26.8328 


* By permission from Statistics for students of psychology and education, by H. 
Sorenson. Copyright 1936, McGraw-Hill Book Company, Inc. 
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Table 0.1. Squares and Square Roots 


Number 

Square 

721 

51 98 41 

111 

52 12 84 

723 

52 27 29 

724 

52 41 76 

725 

52 56 25 

726 

52 70 76 

727 

52 85 29 

728 

52 99 84 

729 

53 14 41 

730 

53 29 00 


to 1,000* ( Continued ) 


Square root 


Number 


26.8514 

26.8701 

26.8887 

26.9072 

26.9258 

26.9444 

26.9629 

26.9815 

27.0000 

27.0185 


761 

762 

763 

764 

765 

766 

767 

768 

769 

770 


of Numbers from 


Square 


Square root 


57 91 21 

58 06 44 
58 21 69 
58 36 96 
58 52 25 
58 67 56 
58 82 89 

58 98 24 

59 13 61 
59 29 00 


27.5862 

27.6043 

27.6225 

27.6405 

27.6586 

27.6767 

27.6948 

27.7128 

27.7308 

27.7489 


731 

732 

733 

734 

735 

736 

737 

738 
7.39 
740 


53 43 61 
53 58 24 
53 72 89 

53 87 56 

54 02 25 
54 16 96 
54 31 69 
54 46 44 
54 61 27 
54 76 00 


27.0370 

27.0555 

27.0740 

27.0924 

27.1109 

27.1293 

27.1477 

27.1662 

27.1846 

27.2029 


771 

772 

773 

774 

775 

776 

777 

778 

779 

780 


59 44 41 
59 59 84 
59 75 29 

59 90 76 

60 06 25 
60 21 76 
60 37 29 
60 52 84 
60 68 41 
60 84 00 


27.7669 

27.7849 

27.8029 

27.8209 

27.8388 

27.8568 

27.8747 

27.8927 

27.9106 

27.9285 


741 

742 

743 

744 

745 

746 

747 

748 

749 

750 


54 90 81 

55 05 64 
55 20 49 
55 35 36 
55 50 25 
55 65 16 
55 80 09 

55 95 04 

56 10 01 
56 25 00 


27.2213 

27.2397 

27.2580 

27.2764 

27.2947 

27.3130 

27.3313 

27.3496 

27.3679 

27.3861 


781 

782 

783 

784 

785 

786 

787 

788 

789 

790 


60 99 61 

61 15 24 
61 30 89 
61 46 56 
61 62 25 
61 77 96 

61 93 69 

62 09 44 
62 25 21 
62 41 00 


27.9464 

27.9643 

27.9821 

28.0000 

28.0179 

28.0357 

28.0535 

28.0713 

28.0891 

28.1069 


751 

752 

753 

754 

755 

756 

757 

758 

759 

760 


56 40 01 
56 55 04 
56 70 09 

56 85 16 

57 00 25 
57 15 36 
57 30 49 
57 45 64 
57 60 81 
57 76 00 


27.4044 

27.4226 

27.4408 

27.4591 

27.4773 

27.4955 

27.5136 

27.5318 

27.5500 

27.5681 


791 

792 

793 

794 

795 

796 

797 

798 

799 

800 


62 56 81 
62 72 64 

62 88 49 

63 04 36 
63 20 25 
63 36 16 
63 52 09 
63 68 04 

63 84 01 

64 00 00 


28.1247 

28.1425 

28.1603 

28.1780 

28.1957 

28.2135 

28.2312 

28.2489 

28.2666 

28.2843 


Per “f SK>n f ™ m Stati sti™ for students of psychology and education 
Sorenson. Copyright 1936, McGraw-Hill Book Company, Inc. 
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BASIC STATISTICAL CONCEPTS 


Table C.l. Squares and Square Roots of Numbers from 
1 to 1,000* ( Continued ) _ 


Number 


Square Square root Number Square Square root 


64 16 01 
64 32 04 
64 48 09 
64 64 16 
64 80 25 

64 96 36 

65 12 49 
65 28 64 
65 44 81 
65 61 00 

65 77 21 

65 93 44 

66 09 69 
66 25 96 
66 42 25 
66 58 56 
66 74 89 

66 91 24 

67 07 61 
67 24 00 

67 40 41 ! 
67 56 84 
67 73 29 

67 89 76 

68 06 25 
68 22 76 
68 39 29 
68 55 84 
68 72 41 

68 89 00 

69 05 61 
69 22 24 
69 38 89 
69 55 56 
69 72 25 

69 88 96 

70 05 69 
70 22 44 

j 70 39 21 
70 56 00 


28,3019 

28.3196 

28.3373 

28.3049 

28.3725 

28.3901 

28.4077 

28.4253 

28.4429 

28.4605 

28.4781 

28.4956 

28.5132 

28.5307 

28.5482 

28.5657 

28.5832 

28.6007 

28.6082 

28.6356 

28.6531 

28.6705 

28.6880 

28.7054 

28.7228 

28.7402 

28.7576 

28.7750 

28.7924 

28.8097 

28.8271 

28.8444 

28.8617 

28.8791 

28.8964 

28.9137 

28.9310 

28.9482 

28.9655 

28.9828 


70 72 81 

70 89 64 

71 06 49 
71 23 36 
71 40 25 
71 57 16 
71 74 09 

71 91 04 

72 08 01 
72 25 00 

72 42 01 
72 59 04 
72 76 09 

72 93 16 

73 10 25 
73 27 36 
73 44 49 
73 61 64 
73 78 81 

73 96 00 

74 13 21 
74 30 44 
74 47 69 
74 64 96 
74 82 25 

74 99 56 

75 16 89 
75 34 24 
75 51 61 
75 69 00 

75 86 41 

76 03 84 
76 21 29 
76 38 76 
76 56 25 
76 73 76 

76 91 29 

77 08 84 
77 26 41 
77 44 00 


29.0000 

29.0172 

29.0345 

29.0517 

29.0689 

29.0861 

29.1033 

29.1204 

29.1376 

29.1548 

29.1719 

29.1890 

29.2062 

29.2233 

29.2404 

29.2575 

29.2746 

29.2916 

29.3087 

29.3258 

29.3428 

29.3598 

29.3769 

29.3939 

29.4109 

29.4279 

29.4449 

29.4618 

29.4788 

29.4958 

29.5127 

29.5296 

29.5466 

29.5635 

29.5804 

29.5973 

29.6142 

29.6311 

29.6479 

29.6648 


* By permission from Statistics for students of psychology and education, by H. 
Sorenson. Copyright 1936, McGraw-Hill Book Company, Inc. 
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Table C.l. Squares and Square Roots of Numbers from 
1 to 1,000* ( Continued ) 


Number 

Square 

Square root 

Number 

Square 

Square root 

881 

77 61 61 

29.6816 

921 

84 82 41 

30.3480 

882 

77 79 24 

29.6985 

922 

85 00 84 

30.3645 

OO 

CO 

77 96 89 

29.7153 

923 

85 19 29 

30.3809 

884 

78 14 56 

29.7321 

924 

85 37 76 

30.3974 

885 

78 32 25 

29.7489 

925 

85 56 25 

30.4138 

886 

78 49 96 

29.7658 

926 

85 74 76 

30.4302 

887 

78 67 69 

29.7825 

927 

85 93 29 

30.4467 

888 

78 85 44 

29.7993 

928 

86 11 84 

30.4631 

889 

79 03 21 

29.8161 

929 

86 30 41 

30.4795 

890 

79 21 00 

29.8329 

930 

86 49 00 

30.4959 

891 

79 38 81 

29.8496 

931 

86 67 61 

30.5123 

892 

79 56 64 

29.8664 

932 

86 86 24 

30.5287 

893 

79 74 49 

29.8831 

933 

87 04 89 

30.5450 

894 

79 92 36 

29.8998 

934 

87 23 56 

30.5614 

895 

80 10 25 

29.9166 

935 

87 42 25 

30.5778 

896 

80 28 16 

29.9333 

936 

87 60 96 

30.5941 

897 

80 46 09 

29.9500 

937 

87 79 69 

30.6105 

898 

80 64 04 

29.9666 

938 

87 98 44 

30.6268 

899 

80 82 01 

29.9833 

939 

88 17 21 

30.6431 

900 

81 00 00 

30.0000 

940 

88 36 00 

30.6594 

901 

81 18 01 

30.0167 

941 

88 54 81 

30.6757 

902 

81 36 04 

30.0333 

942 

88 73 64 

30.6920 

903 

81 54 09 

30.0500 

943 

88 92 49 

30.7083 

904 

81 72 16 

30.0666 

944 

89 11 36 

30.7246 

905 

81 90 25 

30.0832 

945 

89 30 25 

30.7409 

906 

82 08 36 

30.0998 

946 

89 49 16 

30.7571 

907 

82 26 49 

30.1164 

947 

89 68 09 

30.7734 

908 

82 44 64 

30.1330 

948 

89 87 04 

30.7896 

909 

82 62 81 

30.1496 

949 

90 06 01 

30.8058 

910 

82 81 00 

30.1662 

950 

90 25 00 

30.8221 

911 

82 99 21 

30.1828 

951 

90 44 01 

30.8383 

912 

83 17 44 

30.1993 

952 

90 63 04 

30.8545 

913 

83 35 69 

30.2159 

953 

90 82 09 

30.8707 

914 

83 53 96 

30.2324 

954 

91 01 16 

30.8869 

915 

83 72 25 

30.2490 

955 

I 91 20 25 

30.9031 

916 

83 90 56 

30.2655 

956 

91 39 36 

30.9192 

917 

84 08 89 

30.2820 

957 

91 58 49 

30.9354 

918 

84 27 24 

30.2985 

958 

91 77 64 

30.9516 

919 

84 45 61 

30.3150 

959 

91 96 81 

30.9677 

920 

84 64 00 

30.3315 

960 

92 16 00 

30.9839 


* By permission from Statistics for students of psychology and education, % H. 
Sorenson. Copyright 1936, McGraw-Hill Book Company, Inc. 
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BASIC STATISTICAL CONCEPTS 


Table C.l. Squares and Square Roots of Numbers from 
1 to 1,000* ( Continued) 


Number 

Square 

Square root 

Number 

Square 

Square root 

961 

92 35 21 

31.0000 

981 

96 23 61 

31.3209 

962 

92 54 44 

31.0161 

982 

96 43 24 

31.3369 

963 

92 73 69 

31.0322 

983 , 

96 62 89 

31.3528 

964 

92 92 96 

31.0483 

984 | 

96 82 56 

31.3688 

965 

93 12 25 

31.0644 

985 

97 02 25 

31.3847 

966 

93 31 56 

31.0805 

986 

97 21 96 

31.4006 

967 

93 50 89 

31.0966 

987 

97 41 69 

31.4166 

968 

93 70 24 

31.1127 

988 

97 61 44 

31.4325 

969 

93 89 61 

31.1288 

989 

97 81 21 

31.4484 

970 

94 09 00 

31.1448 

990 

98 01 00 

31.4643 

971 

94 28 41 

31.1609 

991 

98 20 81 

31.4802 

972 

94 47 84 

31.1769 

992 

98 40 64 

31.4960 

973 

94 67 29 

31.1929 

993 

98 60 49 

31.5119 

974 

94 86 76 

31.2090 

994 

98 80 36 

31.5278 

975 

95 06 25 

31.22S0 

995 

99 00 25 

31.5436 

976 

95 25 76 

31.2410 

996 

99 20 16 

31.5595 

977 

95 45 29 

31.2570 

997 

99 40 09 

31.5753 

978 

95 64 84 

31.2730 

998 

99 60 04 

31.5911 

979 

95 84 41 

31.2890 

999 

99 80 01 

31.6070 

980 

96 04 00 

31.3050 

1000 

100 00 00 

31.6228 


* By permission from Statistics for students of psychology and education , by H. 
Sorenson. Copyright 1936, McGraw-Hill Book Company, Inc. 



appendix c 


269 


Table C.2. Four-place Common Logarithms op Numbers 
(Base 10)* 


IT 

0 

1 

% 

3 

4 

5 

1 6 

7 

8 

9 


5 _ 

- 000 

0 301 

0 477 

1 602 

1 699 

0 778 

2 845 

1 903 

1 9542 

< 

000 

301 

477 

0 041 
0 322 
1 491 

4 079 
2 342 
4 505 

2 113 
4 361 
1 518 

9 146 
7 380 
5 531 

1 176 

2 397 
5 544 

1 204 
9 415 
1 556 

1 230 
0 431 
3 568 

4 255 
4 447 
2 579 

3 2788 
2 4624 
8 5911 

A 

£ 

e 

602 

699 

778 

1 612 
0 707 
2 785 

8 623 
6 716 
3 792 

2 633 
0 724 
4 799 

5 643 
3 732 
3 806 

5 653 
4 740 

2 812 

2 662 
4 748 

9 819 

5 672 

2 755 
5 826 

1 681 
9 763 
1 832 

2 6902 

4 7709 

5 8388 

7 

8 
9 

845 

903 

954. 

1 851 

1 908 

2 959( 

3 857. 
5 913? 
963? 

3 863 
3 919 
968 

3 869 
924 
5 973 

2 875 

3 929; 
1 977' 

t 880? 
1 934 
r 982. 

3 886 
5 939 
986? 

5 892 
5 944 

3 991 ; 

8976 

5 9494 
9956 

10 

000( 

004 ; 

008( 

012? 

017( 

3 02U 

2 025; 

029^ 

033; 

0374 

11 

12 

13 

04U 

079$ 

113£ 

045; 

0828 

1172 

0491 

0864 

120£ 

053] 

089E 

1231 

056< 

093^ 

127] 

060" 

096E 

130c 

064; 

1004 

133 ; 

068S 

103?: 

1367 

07U 

107S 

139S 

0755 

1106 

1430 

14 

15 

16 

1461 

1762 

2041 

1492 

1790 

2068 

1523 

1818 

2095 

1552 

1847 

2122 

158^ 

187£ 

2142 

1614 

190c 

217£ 

1644 

1931 

2201 

1675 

19 5S 

2227 

170c 

1987 

2252 

1732 

2014 

2279 

17 

18 
19 

2304 

2553 

2788 

2330 

2577 

2810 

2355 

2601 

2833 

2380 

2625 

2856 

240£ 

2642 

2878 

2430 

2672 

2900 

2455 

2695 

2934 

2480 

2718 

2945 

2504 

2742 

2967 

2529 

2765 

2989 

20 

3010 

3032 

3054 

3075 

3096 

3118 

3139 

3160 

3181 

3201 

21 

22 

23 

3222 

3424 

3617 

3243 

3444 

3636 

3263 

3464 

3655 

3284 

3483 

3674 

3304 

3502 

3692 

3324 

3522 

3711 

3345 

3541 

3729 

3365 

3560 

3747 

3385 

3579 

3766 

3404 

3598 

3784 

24 

25 

26 

3802 

3979 

4150 

3820 

3997 

4166 

3838 

4014 

4183 

3856 

4031 

4200 

3874 

4048 

4216 

3892 

4065 

4232 

3909 

4082 

4249 

3927 

4099 

4265 

3945 

4116 

4281 

3962 

4133 

4298 

27 

28 
29 

4314 

4472 

4624 

4330 

4487 

4639 

4346 

4502 

4654 

4362 

4518 

4669 

4378 

4533 

4683 

4393 

4548 

4698 

4409 

4564 

4713 

4425 

4579 

4728 

4440 

1 4594 
4742 

4456 

1 4609 
4757 

SO 

4771 

4786 

4800 

4814 

4829 

4843 

4857 

4871 

4886 

4900 

31 

32 

33 

4914 

5051 

5185 

4928 

5065 

5198 

4942 

5079 

5211 

4955 

5092 

5224 

4969 

5105 

5237 

4983 

5119 

5250 

4997 

5132 

5263 

5011 

5145 

5276 

5024 

5159 

5289 

5038 

5172 

5302 

34 

35 

36 

5315 

5441 

5563 

5328 

5453 

5575 

5340 

5465 

5587 

5353 

5478 

5599 

5366 

5490 

5611 

5378 

5502 

5623 

5391 

5514 

5635 

5403 

5527 

5647 

5416 

5539 

5658 

5428 

5551 

5670 

37 

38 

39 

5682 

5798 

5911 

5694 

5809 

5922 

5705 

5821 

5933 

5717 

5832 

5944 

5729 

5843 

5955 

5740 

5855 

5966 

5752 

5866 

5977 

5763 

5877 

5988 

5775 

5888 

5999 

5786 

5899 

6010 

40 

6021 

6031 

6042 

6053 

6064 

6075 

6085 

6096 

6107" 

6117 

41 " 

42 

43 

6128 

6232 

6335 

6138' 

6243 

6345 

6149 

6253 

6355 

6160 

6263 

6365 

6170 

6274 

6375 

6180 

6284 

6385 

6191 

6294 

6395 

6201 

6304 

6405 

6212' 

6314 

6415 

6222 

6325 

6425 

44 

45 

46 

6435 

6532 

6628 

6444 

6542 

6637 

6454 

6551 

6646 

6464 

6561 

6656 

6474 

6571 

6665 

6484 

6580 

6675 

6493 

6590 

6684 

6503 

6599 

6693 

6513 

6609 

6702 

6522 

6618 

6712 

47 

48 

49 

6721 

6812 

6902 

6730 

6821 

6911 

6739 

6830 

6920 

6749 

6839 

6928 

6758 

6848 

6937 

6767 

6857 

6946 

6776 

6866 

6955 

6785 

6875 

6964 

6794 

6884 

6972 

6803 

6893 

6981 

50 " 

6990 

6998 ~ 

7007 

7016 

7024 

7033 

7042 

7050 

7059 

7067 

T. 

0 

1 

2 1 

3 | 

4 

B 1 

6 

7 

8 1 

9 


Prop. Parts 


1 

2 

3 

4 

5 

6 

7 

8 
9 

1 

2 

3 

4 

5 

6 

7 

8 
9 

1 

2 

3 

4 

5 

6 

7 

8 
9 

1 

2 

3 

4 

5 

6 

7 

8 
9 

1 

2 

3 

4 

5 

6 

7 

8 
9 

1 

2 

3 

4 
3 
6 

7 

8 
9 
s 

1 

2 

3 

4 

5 

6 

7 

8 
9 


2.2 

4.4 

6.6 

8.8 

11.0 

13.2 

15.4 

17.6 

19.8 
20 

2.0 

4.0 

6.0 

8.0 

10.0 

12.0 

14.0 

16.0 

18.0 

18 

1.8 

3.6 

3.4 

7.2 
9.0 

10.8 

12.6 

14.4 

16.2 

16 

1.6 

3.2 

4.8 

6.4 
8.0 

9.6 
11.2 
12.8 

14.4 
14 

1.4 

2.8 

4.2 

5.6 
7.0 

8.4 

9.8 
11.2 
12.6 

12 

1.2 

2.4 

3.6 

4.8 

6.0 

7.2 

8.4 

9.6 

10.8 

9 

0.9 

1.8 

2.7 

3.6 

4.5 

5.4 

6.3 

7.2 

8.1 


21 

2.1 

4.2 

6.3 

8.4 

10.5 

12.6 

14.7 

16.8 

18.9 
19 

1.9 

3.8 

5.7 

7.6 
9.5 

11.4 
13.3 

15.2 

17.1 
17 

1.7 

3.4 

5.1 

6.8 

8.5 

10.2 

11.9 

13.6 

15.3 
15 

1.5 
3.0 

4.5 

6.0 

7.5 
9.0 

10.5 

12.0 

13.5 
13 

1.3 

2.6 

3.9 

5.2 

6.5 

7.8 

9.1 

10.4 

11.7 
11 

1.1 

2.2 

3.3 

4.4 

5.5 

6.6 

7.7 

8.8 

9.9 

8 

0.8 

1.6 

2.4 j 

3.2 
4.0 

4.8 

5.6 

6.4 

7.2 


***•’ * L - c<w " eht «■> 
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Table C.3. Natural Logarithms (Base e)* 


.10-.99 



divide the number by 10 and add 2.303 to the In obtained, 
or divide the number by 100 and add 4.605 to the In obtained, 
or divide the number by 1,000 and add 6.908 to the In obtained, etc. 

To obtain the natural logarithm of numbers less than .1: 

multiply the number by 10 and subtract 2.303 from the In obtained, 
or multiply by 100 and subtract 4.605 from the In obtained, 
or multiply by 1,000 and subtract 6.908 from the In obtained, etc. 


Special Values 


a 

0 

In !--* 
a 

In 

1 — a 

.01 

.01 

4.595 

-4.595 

.01 

‘.05 

4.554 

-2.986 

.05 

.01 

2.986 

-4.554 

.05 

.05 

2.944 

-2.944 


* By permission from Introduction to statistical analysis, by W. J. Dixon and F. J. 
Massey. Copyright 1951, McGraw-Hill Book Company, Tnc. 
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Table C.5. Transformation of r to z (and p to £)* 


r 

z 

r 

z 

r 

z 

r 


z 

r 


z 

r 


z 

. 25 f 

.26 

.40 

.42 

.55 

.62 

.70 


.87 

.85 

1 

.26 

.950 

1 

.83 

.26 

.27 

.41 

.44 

.56 

.63 

.71 


.89 

.86 

1 

.29 

.955 

1 

.89 

.27 

.28 

.42 

.45 

.57 

.65 

.72 


.91 

.87 

1 

.33 

.960 

1 

.95 

.28 

.29 

.43 

.46 

.58 

. 66 

.73 


.93 

.88 

1 

.38 

.965 

2 

.01 

.29 

.30 

.44 

.47 

.59 

.68 

.74 


.95 

.89 

1 

.42 

.970 

2 

.09 

.30 

.31 

.45 

.48 

.60 

.69 

.75 


.97 

.90 

1 

.47 

.975 

2 

.18 

.31 

.32 

.46 

.50 

.61 

.71 

.76 

1 

.00 

.905 

1 

.50 

.980 

2, 

.30 

.32 

.33 

.47 

.51 

.62 

.73 

.77 

1 

.02 

.910 

1 

.53 

.985 

2. 

.44 

.33 

.34 

.48 

.52 

.63 

.74 

! .78 

1 

.05 

.915 

1 

.56 

.990 

2, 

65 

.34 

.35 

.49 

.54 

.64 

.76 

.79 

1 

.07 

.920 

1 

.59 

.995 

2. 

99 

.35 

.37 

.50 

.55 

.65 

.78 

.80 

1 

.10 

.925 

1 . 

62 




.36 

.38 

.51 

.56 

.66 

.79 

.81 

1 , 

.13 

.930 

1 . 

66 




.37 

.39 

.52 

.58 

.67 

.81 

.82 

1 . 

16 

.935 

1 . 

70 




.38 

.40 

.53 

.59 

.68 

.83 

.83 

1 . 

19 

.940 

1 . 

74 




.39 

.41 

.54 

.60 

.69 

.85 

.84 

1 . 

22 

.945 

1 . 

78 





* By permission from Fundamental statistics in psychology and education by J P 
Guilford. Copyright 1950, McGraw-Hill Book Company, Inc. The values in 
Table C.5 were derived by interpolation from Table VB of R. A. Fisher, Statistical 
methods for Research workers, published by Oliver & Boyd, Ltd., Edinburgh by 
permission of the author and publishers, 
f For all values of r below .25, r = 


0. 
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Table C.6. Deeivatives 
[fix) + g(x)] = f(x) + g'{x) 
U(x)g(x)] = f(x)g'(x) + g(x)f(x) 
[f(x)Y = c[f{x)Y-f{x) 


£ 

dx 

d_ 
dx 
d 
dx 

d_ [f(x)l = gjx)fjx) - f{x)g'{x) 
dxlg(x)J [g(%)] 2 

£ c /(*) = log e cf(x)e f{x) Iog ® c 
dx 

d n 
& c = 0 
d 

Tx x = l 
^ cf(x) = cf’ix) 
d 

—- cx = c 
dx 

£ x c = CX c ~ l 

dx 


dx 6 ' 

d_ 

dx 

d_ 

dx 

d_ 

dx 

d 


•/(X) = e f(x)f( X ) 


e x = e x 
log c fix) 
log c fix) 


1 fix) 


logo C f ix) 

. fix) 

’fix) 


logo X = - 

dx x 


dx 

£ 

dx 

£ 

dx 


sm x 


cos x 


cos x 


-sm x 


tan x = sec 2 x 



APPENDIX C 


2 75 


Table C.7. Primitive Functions (Indefinite Integrals) 

J [f(%) + $(%)] dx = J f(x) dx + J g(x) dx 
J af(x) dx = a J /(as) dx 
J a dx = ax + c 

/ 


a# 6 efo 


+ c 6^—1 


6 + 1 
f - dx = log, £ + c 

J X 

J log e X dx = X log e £ — £ + C 

[ a x dx — — + c a positive 

J log, a 

J e x dx = e x + c 

f e ax dx — - e ax c 

J a 

j i log, x dx = ^ (log,, x ) 2 + c 

f — 7 - 7 , dx = a log e (* + 6 ) + c 
J x + 0 

/ T6 ax2 db = —-h C 

J 2 a 

J sin x dx ~ — cos a: + c 
J cos x dx = sin a: + c 

y cos x dx = x n sin x — n j x n ~ l sin x dx n positive 

(x in radians; 2tt radians = 360°) 
Jx n sin x dx = —x n cos x + n j x n ~ 1 cos x dx {x in radians) n positive 

J x n e x dx = x n e x — n j x n ~ 1 e x dx n positive 





Appendix D 

tables of sampling distributions 


Notes on the Use of Tables 

interpolation is reasonably accurate ^ ^ ‘ Lmear 

relSn f,“7 p F ” “**“"■ «“ «* 

Table n\ n 1 ; ( ] interpolation is reasonably accurate 

p ”»>» r Wth.„ 

tributed with zero mean amfunit variance^ ‘ PPK ™“ tel 5' <*» 

I'het,"p”“d “*” d tSllt 


2.358 + 


Ho 


Ho ~ M20 


(2.423 - 2.358) = 2.3905 


' /IjJU 

rivlf t h”7„et?74 to th “ V “‘“ “ * h,, ‘* ble ° f 2 ' 3 ”' Li »“ interpolation 

^“rSr " F ” •<». -020, .01, 

prob <F„.„ < Fi) . prob ( y , , > J.) . , _ p „„ ( y . _ < ^ 

For example, to find the value F, ench that prob (F„ < ft, , .o 25> « t, ke 

prob (F 4i7 < F x ) = 1 _ pro b (j? i < \/p \ _ q 25 
prob (F 7 ,4 < l/ii’j) = .975 ' 7 17 - 025 

From the table, 1 /F x = 9.07- therefore P - no r , 

^ 17,25 = Fi 5 20 — - ~ 5 ) /E* p . 

(Mo - Mo) (//l5 *20 -F 15i30 ) 


- Mi) 


(Ms _ 
(Ms — Mo) 


(^15,20 — F 20,20) = 2.79 
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Table D.l. Ordinates of Normal Density Function with Zero 

Mean and Unit Variance 


/(as) = 0(0,1) == 


__ p-xVl 

V&F 


X 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

.0 

.1 

.2 

.3 

.4 

.3989 

.3970 

.3910 

.3814 

.3683 

.3989 

.3965 

.3902 

.3802 

.3668 

.3989 

.3961 

.3894 

.3790 

.3653 

.3988 

.3956 

.3885 

.3778 

.3637 

.3986 

.3951 

.3876 

.3765 

.3621 

.3984 

.3945 

.3867 

.3752 

.3605 

.3982 

.3939 

.3857 

.3739 

.3589 

.3980 

.3932 

.3847 

.3725 

.3572 

.3977 

.3925 

.3836 

.3712 

.3555 

.3973 

.3918 

.3825 

.3697 

.3538 

.5 

.6 

.7 

.8 

.9 

.3521 

.3332 

.3123 

.2897 

.2661 

.3503 

.3312 

.3101 

.2874 

.2637 

.3485 

.3292 

.3079 

.2850 

.2613 

.3467 

.3271 

.3056 

.2827 

.2589 

.3448 

.3251 

.3034 

.2803 

.2565 

.3429 

.3230 

.3011 

.2780 

.2541 

.3410 

.3209 

.2989 

.2756 

.2516 

.3391 

.3187 

.2966 

.2732 

.2492 

.3372 

.3166 

.2943 

.2709 

.2468 

.3352 

.3144 

.2920 

.2685 

.2444 

1.0 

1.1 

1.2 

1.3 

1.4 

.2420 

.2179 

.1942 

.1714 

.1497 

.2396 

.2155 

.1919 

.1691 

.1476 

.2371 

.2131 

.1895 

.1669 

.1456 

.2347 

.2107 

.1872 

.1647 

.1435 

.2323 

.2083 

.1849 

.1626 

.1415 

.2299 

.2059 

.1826 

.1604 

.1394 

.2275 

.2036 

.1804 

.1582 

.1374 

.2251 

.2012 

.1781 

.1561 

.1354 

.2227 

.1989 

.1758 

.1539 

.1334 

.2203 

.1965 

.1736 

.1518 

.1315 

1.5 

1.6 

1.7 

1.8 
1.9 

.1295 
.1109 
.0940 
.0790 
. 0656 1 

.1276 

.1092 

.0925 

.0775 

.0644 

.1257 

.1074 

.0909 

.0761 

.0632 

.1238 

.1057 

.0893 

.0748 

.0620 

.1219 

.1040 

.0878 

.0734 

.0608 

.1200 

.1023 

.0863 

.0724 

.0596 

.1182 

.1006 

.0848 

.0707 

.0584 

.1163 

.0989 

.0833 

.0694 

.0573 

.1145 

.0973 

.0818 

.0681 

.0562 

.1127 

.0957 

.0804 

.0669 

.0551 

2.0 

2.1 

2.2 

2.3 

2.4 

.0540 

.0440 

.0355 

.0283 

.0224 

.0529 
.0431 
.0347 
.0277 
: .0219 

.0519 
.0422 
.0339 
.0270 
■ .0213 

.0508 

.0413 

.0332 

.0264 

.0208 

.0498 
.0404 
.0325 
.0258 
; .0203 

.0488 
.0396 
.0317 
.0252 
: .0198 

.0478 
.0387 
.0310 
.0246 
; .0194 

.0468 
.0379 
i .0303 
, .0241 
L .0189 

.0459 
.0371 
.0297 
.0235 
i .0184 

.0449 

.0363 

.0290 

.0229 

.0180 

2.5 

2.6 

2.7 

2.8 
2.9 

.0175 

.0136 

.0104 

.0079 

.006C 

1 .0171 

> .0132 
t .0101 

> .007; 
) .0058 

.0167 
! .0129 
.0099 
' .0075: 
l .005( 

' .0163 
► .0126 

> .0096 

> .0078 
J .0058 

! .0158 
i .0122 

> .0098 
} .0071 

> .0058 

I .0151 
! .0111 
! .0091 
L .0061 
! .0051 

L .0151 
) .0116 
: .0088 
) .006; 
L .005( 

. .0147 
> .0112 
l .0086 
1 .0061 
) .0048 

r .0143 
5 .0110 
i .0084 
> .0062 
$ . 004 ; 

; .0139 

1 .0107 
l .0081 
! .0061 
’ .0046 

3.0 

3.1 

3.2 

3.3 

3.4 

.004^ 

.0035 

.002' 

.ooi*: 

.001! 

[ .0045 
5 .003! 

1 .0025 
7 .001i 

2 .001! 

1 .004! 

2 .003: 
1 .002! 
7 -001( 
2 .001! 

2 .004( 
L ,003( 
2 .002! 

3 .001( 
2 .001 

) .003* 
) .002 ( 
2 .002 
3 .001 
1 .001 

) .003! 

) .002! 
L .0021 
5 .001J 
1 .0011 

i . 003 : 

l .002' 
) .002< 
5 .001- 
3 .001< 

7 .003( 
7 .002( 
1 .001! 
4 .001- 
0 .OOK 

3 .0031: 
3 .0021 
1 .0011 

4 .001! 
0 .000! 

j .0034 
j .0025 

3 .0018 

3 .0013 

1 .0009 

3.5 

3.6 

3.7 

3.8 

3.9 

.000! 

.0001 

.000 

.000 

.000 

9 .000! 

6 .O00 
4 .000 
3 .000 
2 .000 

8 .000: 

6 .000' 

4 .000 

3 .000 

2 .000 

8 .000 
6 .000 
4 .000 
3 .000 
2 .000 

S .000! 
5 .000 
4 .000 
3 .000 
2 .000 

8 .000' 
5 .000 
4 .000 
3 .000 
2 .000 

7 .000 
5 .000 
4 .000 
2 .000 
2 .000 

7 .000 
5 .000 
3 .000 
2 .000 
2 .000 

7 .000' 
5 .000 
3 .000 
2 .000 
2 .000 

7 .0006 
5 .0004 
3 .0003 
2 .0002 
1 .0001 
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Table D.2. Cumulative Normal Distributions 


F(x) = <!>((),l)dt = 



-7= e~> 2/i dt 

\Z2tt 



fx-j-xa- 


-- e -(l/2<r2)(*-A02 (ft 

\/2tt a 


T 2 ) dt 


X 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

.0 

.5000 

.5040 

.5080 

.5120 

.5160 

.5199 

.5239 

.5279 

.5319 

.5359 

.1 

.5398 

.5438 

.5478 

.5517 

.5557 

.5596 

.5636 

.5675 

.5714 

.5753 

.2 

.5793 

.5832 

.5871 

.5910 

.5948 

.5987 

.6026 

.6064 

.6103 

.6141 

.3 

.6179 

.6217 

.6255 

.6293 

.6331 

.6368 

.6406 

.6443 

.6480 

.6517 

.4 

.6554 

.6591 

.6628 

.6664 

.6700 

.6736 

.6772 

.6808 

.6844 

.6879 

.5 

.6915 

.6950 

.6985 

.7019 

.7054 

.7088 

.7123 

.7157 

.7190 

.7224 

.6 

.7257 

.7291 

.7324 

.7357 

.7389 

.7422 

.7454 

.7486 

.7517 

.7549 

.7 

.7580 

.7611 

.7642 

.7673 

.7704 

.7734 

.7764 

.7794 

.7823 

.7852 

.8 

.7881 

.7910 

.7939 

.7967 

.7995 

.8023 

.8051 

.8078 

.8106 

.8133 

.9 

.8159 

.8186 

.8212 

.8238 

.8264 

.8289 

.8315 

.8340 

.8365 

.8389 

1.0 

.8413 

.8438 

.8461 

.8485 

.8508 

.8531 

.8554 

.8577 

.8599 

.8621 

1.1 

.8643 

.8665 

.8686 

.8708 

.8729 

.8749 

.8770 

.8790 

.8810 

.8830 

1.2 

.8849 

.8869 

.8888 

.8907 

.8925 

.8944 

.8962 

.8980 

.8997 

! .0015 

1.3 

.9032 

.9049 

.9066 

.9082 

.9099 

.9115 

.9131 

.9147 

.9162 

.9177 

1.4 

.9192 

.9207 

.9222 

.9236 

.9251 

.9265 

.9279 

.9292 

.9306 

.9319 

1.5 

.9332 

.9345 

.9357 

.9370 

.9382 

.9394 

1 .9406 

.9418 

.9429 

.9441 

1.6 

.9452 

.9463 

.9474 

.9484 

.9495 

.9505 

.9515 

.9525 

.9535 

.9545 

1.7 

.9554 

.9564 

.9573 

: .9582 

.9591 

.9599 

.9608 

.9616 

I .9625 

.9633 

1.8 

.9641 

.9649 

.9656 

.9664 

.9671 

.9678 

.9686 

.9693 

.9699 

.9706 

1.9 

.9713 

.9719 

.9726 

.9732 

.9738 

.9744 

.9750 

.9756 

! .9761 

.9767 

2.0 

.9772 

.9778 

.9783 

.9788 

.9793 

.9798 

.9803 

.9808 

.9812 

.9817 

2.1 

.9821 

.9826 

.9830 

.9834 

.9838 

.9842 

.9846 

.9850 

.9854 

.9857 

2.2 

.9861 

.9864 

.9868 

.9871 

.9875 

.9878 

.9881 

.9884 

.9887 

.9890 

2.3 

.9893 

.9896 

.9898 

.9901 

.9904 

.9906 

.9909 

.9911 

.9913 

.9916 

2.4 

.9918 

.9920 

.9922 

.9925 

.9927 

.9929 

.9931 

.9932 

.9934 

.9936 

2.5 

.9938 

.9940 

.9941 

.9943 

.9945 

.9946 

.9948 

! .9949 

.9951 

.9952 

2.6 

.9953 

.9955 

.9956 

.9957 

.9959 

.9960 

.9961 

.9962 

.9963 

.9964 

2.7 

.9965 

.9966 

.9967 

.9968 

.9969 

.9970 

.9971 

.9972 

.9973 

.9974 

2.8 

.9974 

.9975 

.9976 

.9977 

.9977 

.9978 

.9979 

.9979 

.9980 

.9981 

2.9 

.9981 

.9982 

.9982 

.9983 

.9984 

.9984 

.9985 

.9985 

.9986 

j 

.9986 

3.0 

.9987 

.9987 

.9987 

.9988 

.9988 

.9989 

.9989 

.9989 

.9990 

.9990 

3.1 

.9990 

.9991 

.9991 

.9991 

.9992 

.9992 

.9992 

.9992 

.9993 

.9993 

3.2 

.9993 

.9993 

.9994 

.9994 

.9994 

.9994 

.9994 

.9995 

.9995 

.9995 

3.3 

.9995 

.9995 

.9995 

.9996 

.9996 

.9996 

.9996 

.9996 

.9996 

.9997 

3.4 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9998 


X 

1.282 

1.645 

1.960 

2.326 

2.576 

3.090 

3.291 

3.891 ' 

4.417 

F(x) 

.90 

.95 

.975 

.99 

.995 

.999 

.9995 

.99995 

.999995 

2[1 - F(x)} 

.20 

.10 

.05 

.02 

.01 

.002 

.001 

.0001 

.00001 
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* By permission from Introduction to the theory of statistics, by A. M. Mood. Copyright 1950, McGraw-Hill Book Company, Inc. 
This table is abridged from Catherine M. Thompson, Tables of percentage points of the incomplete beta function and of the chi- 
square distribution, Biometrika , vol. 32, 1941, published with permission of the author and of the editor of Biometrika . 































APPENDIX D 


281 


Table D.4. Cumulative t Distributions* 


F(t n ) 
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* By permission from Introduction to the theory of statistics, by A M Mood 
Copyright 1950, McGraw-Hill Book Company, Inc. This table is abridged from 
Table IV of R. A. Fisher and F. Yates, Statistical tables for biological, agricultural 
and medical research, published by Oliver & Boyd, Ltd., Edinburgh, by permission 
of the authors and publishers. 



Table D.5. Cumulative F Distributions* 
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* By permission from Introduction to the theory of statistics , by A. M. Mood. Copyright 1950, McGraw-Hill Book Company, Inc. 
This table is abridged from Tables of percentage points of the inverted beta distribution, Biometrika , vol. 33, 1943, published with 
the permission of the authors Maxine Merrington and Catherine M. Thompson, and of the editor of Biometrika . 
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Table D.6. Critical Values of r for Sign Test* 
(Two-tailed percentage points for the binomial for p = .5) 
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17 

18 

20 

6 


0 

0 

1 

51 

15 

18 

19 

20 

7 


0 

0 

1 

52 

16 

18 

19 

21 

8 

0 

0 

1 

1 

53 

16 

18 

20 

21 

9 

0 

1 

1 

2 

54 

17 

19 

20 

22 

10 

0 

1 

1 

2 

55 

17 

19 

20 

22 

11 

0 

1 

2 

3 

56 

17 

20 

21 

23 

12 

1 

2 

2 

3 

57 

18 

20 

21 

23 

13 

1 

2 

3 

3 

58 

18 

21 

22 

24 

14 

1 

2 

3 

4 

59 

19 

21 

22 

24 

15 

2 

3 

3 

4 

60 

19 

21 

23 

25 

16 

2 

3 

4 

5 

61 

20 

22 

23 

25 

17 

2 

4 

4 

5 

62 

20 

22 

24 

25 

18 

3 

4 

5 

6 

63 

20 

23 

24 

26 

19 

3 

4 

5 

6 

64 

21 

23 

24 

26 

20 

3 

5 

5 

6 

65 

21 

24 

25 

27 

21 

4 

5 

6 

7 

66 

22 

24 

25 

27 

22 

4 

5 

6 

7 

67 

22 

25 

26 

28 

23 

4 

6 ' 

7 

8 

68 

22 

25 

26 

28 

24 

5 

6 

7 

8 

69 

23 

25 

27 

29 

25 

5 

7 

7 

9 

70 

23 

26 

27 

29 

26 

6 

7 

8 

9 

71 

24 

26 

28 

30 

27 

6 

7 

8 

10 

72 

24 

27 

28 

30 

28 

6 

8 

9 

10 

73 

25 

27 

28 

31 

29 

7 

8 

9 

10 

74 

25 

28 

29 

31 

30 

7 

9 

10 

11 

75 

25 

28 

29 

32 

31 

7 

9 

10 

11 

76 

26 

28 

30 

32 

32 

8 

9 

10 

12 

77 

26 

29 

30 

32 

33 

8 

10 

11 

12 

78 

27 

29 

31 

33 

34 

9 

10 

11 

13 

79 

27 

30 

31 

33 

35 

9 

11 

12 

13 

80 

28 

30 

32 

34 

36 

9 

11 

12 

14 

81 

28 

31 

32 

34 

37 

10 

12 

13 

14 

82 

28 

31 

33 

35 

38 

10 

12 

13 

14 

83 

29 

32 

33 

35 

39 

11 

12 

13 

15 

84 

29 

32 

33 

36 

40 

11 

13 

14 

15 

85 

30 

32 

34 

36 

41 

11 

13 

14 

16 

86 

30 

33 

34 

37 

42 

12 

14 

15 

16 

87 

31 

33 

35 

37 

43 

12 

14 

15 

17 

88 

31 

34 

35 

38 

44 

13 

15 

16 

17 

89 

31 

34 

36 

38 

45 

13 

15 

16 

18 

90 

32 

35 

36 

39 


* By permission from Introduction to statistical analysis , by W. J. Dixon and F. J. 
Massey. Copyright 1951, McGraw-Hill Book Company, Inc. This table 
originally appeared in The statistical sign test, by W. J. Dixon and A. M. Mood, 
Journal of the American Statistical Association, vol. 41, pp. 557-566, 1946, and is 
reprinted by permission of the authors and publisher. 
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Table D.7. Critical Values for Number of Runs (r) in Samples 
of rii a y s and n 2 b’ s, u ,025 Being Largest Integer Such That 
P(r < u. 025) < .025 and u. 975 Being Smallest Integer Such 
That P(r > w. 97fi ) < .025* 


W.025 



2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 19 

20 

2 



















3 



















4 



















5 



2 

2 















6 


2 

2 

3 

3 














7 


2 

2 

3 

3 

3 













8 


2 

3 

3 

3 

4 

4 












9 


2 

3 

3 

4 

4 

5 

5 











10 


2 

3 

3 

4 

5 

5 

5 

6 










11 


2 

3 

4 

4 

5 

5 

6 

6 

7 









12 

2 

2 

3 

4 

4 

5 

6 

6 

7 

7 

7 








13 

2 

2 

3 

4 

5 

5 

6 

6 

7 

7 

8 

8 







14 

2 

2 

3 

4 

5 

5 

6 

7 

7 

8 

8 

9 

9 






15 

2 

3 

3 

4 

5 

6 

6 

7 

7 

8 

8 

9 

9 

10 





16 

2 

3 

4 

4 

5 

6 

6 

7 

8 

8 

9 

9 

10 

10 

11 




17 

2 

3 

4 

4 

5 

6 

7 

7 

8 

9 

9 

10 

10 

11 

11 

11 



18 

2 

3 

4 

5 

5 

6 

7 

8 

8 

9 

9 

10 

10 

11 

11 

12 

12 


19 

2 

3 

4 

5 

6 

6 

7 

8 

8 

9 

10 

10 

11 

11 

12 

12 

13 13 


20 

2 

3 

4 

5 

6 

6 

7 

8 

9 

9 

10 

10 

11 

12 

12 

13 

13 13 

14 


* Adapted by permission from Introduction to statistical analysis , by W. J. Dixon 
and F. J. Massey. Copyright 1951, McGraw-Hill Book Company, Inc. This 
table is an abridgment, with permission, from Frieda S. Swed and C. Eisenhart, 
Tables for testing randomness of grouping in a sequence of alternatives, Annals of 
Mathematical Statistics, vol. 14, p. 66, 1943. 
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Table D.7. Critical Values for Number of Runs (r) in Samples 
of TU a’ s and n<i b’s, u. 025 Being Largest Integer Such That 
P(t < w.025) ^ .025 and u ,975 Being Smallest Integer Such 
~ That P(r > w.975) < -025* ( Continued ) 


ti.975 


\«1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

32 

13 

1 ! 

15 

16 

17 

18 

19 20 




















1 



















2 

3 



















4 

6 


















5 

6 

8 

9 

10 















6 

6 

8 

9 

10 

11 














7 

6 

8 

10 

11 

12 

13 













8 

6 

8 

10 

11 

12 

13 

14 












9 

6 

8 

10 

12 

13 

14 

14 

15 











10 

6 

8 

10 

12 

13 

14 

15 

16 

16 










11 

6 

8 

10 

12 

13 

14 

15 

16 

17 

17 









12 

6 

8 

10 

12 

13 

14 

16 

16 

17 

18 

19 








13 

6 

8 

10 

12 

14 

15 

16 

17 

18 

19 

19 

20 







14 

6 

8 

10 

12 

14 

15 

16 

17 

18 

19 

20 

20 

21 






15 

6 

8 

10 

12 

14 

15 

16 

18 

18 

19 

20 

21 

22 

22 





16 

6 

8 

10 

12 

14 

16 

17 

18 

19 

20 

21 

21 

22 

23 

23 




17 

6 

8 

10 

12 

14 

16 

17 

18 

19 

20 

21 

22 

23 

23 

24 

25 



18 

6 

8 

10 

12 

14 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

25 

26 


19 

6 

8 

10 

12 

14 

16 

17 

18 

20 

21 

22 

23 

23 

24 

25 

26 

26 

27 

20 

6 

8 

10 

12 

14 

16 

17 

18 

20 

21 

22 

23 

24 

25 

25 

26 

27 

27 28 


* Adapted by permission from Introduction to statistical analysis , by W. J. Dixon 
and F. J. Massey. Copyright 1951, McGraw-Hill Book Company, Inc. This 
table is an abridgment, with permission, from Frieda S. Swed and C. Eisenhart, 
Tables for testing randomness of grouping in a sequence of alternatives, Annals of 
Mathematical Statistics, vol. 14, p. 66, 1943. 
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Table D.7. Critical Values for Number of Runs (r) in Samples 
of n i a* s and n 2 b’s, u, 02 5 Being Largest Integer Such That 
P(r < .025) < .025 and u, 975 Being Smallest Integer Such 
That P(r > ^.975) < .025* ( Continued ) 


ni — n 2 

H , 025 

^.975 

ni = n 2 

H . 025 

^.975 

20 

14 

28 

40 

31 

51 

21 

15 

29 

42 

33 

53 

22 

16 

30 

44 

35 

55 

23 

16 

32 

46 

37 

57 

24 

17 

33 

48 

38 

60 

25 

18 

34 

50 

40 

62 

26 

19 

35 

55 

45 

67 

27 

20 

36 

60 

49 

73 

28 

21 

37 

65 

54 

78 

29 

22 

38 

70 

58 

84 

30 

22 

40 

75 

63 

89 

32 

24 

42 

80 

68 

94 

34 

26 

44 

85 

72 

100 

36 

28 

46 

90 

77 

105 

38 

30 

48 

95 

82 

110 




100 

86 

116 


The values listed permit one to make a two-tailed test at the .05 level or a 
one-tailed test at the .025 level. 

For values of n 1 and n 2 larger than 20, a normal approximation can be used. 


The mean is 


2nifi2 
n 1 + n 2 


+ 1, and the variance is 


2nin 2 (2nin 2 — n\ — n 2 ) 
{rii + n 2 ) 2 (rii + n 2 — 1) 


* Adapted by permission from Introduction to statistical analysis , by W. J. Dixon 
and F. J. Massey. Copyright 1951, McGraw-Hill Book Company, Inc. This 
table is an abridgment, with permission, from Frieda S. Swed and C. Eisenhart, 
Tables for testing randomness of grouping in a sequence of alternatives, Annals of 
Mathematical Statistics } vol. 14, p. 66, 1943. 
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Table D.8. Probability (p) That Smaller Sum of Ranks in 
Wilcoxon’s Matched-pairs Signed-ranks Test Will Be 
Equal to or Less than T, if Differences in 
Population Are Symmetrically Distributed 
about Zero* 

(T is given in the body of the table, to the nearest integer. N is the 
number of differences) 


N 

p = .05 

O 

II 

p = .01 

6 

0 



7 

2 

0 


8 

4 

2 

0 

9 

6 

3 

2 

10 

8 

5 

3 

11 

11 

7 

5 

12 

14 

10 

7 

13 

17 

13 

10 

14 

21 

16 

13 

15 

25 

20 

16 

16 

30 

24 

20 

17 

35 

28 

23 

18 

40 

33 

28 

19 

46 

38 

32 

20 

52 

43 

38 

21 

59 

49 

43 

22 

66 

56 

49 

23 

73 

62 

55 

24 

81 

69 

61 

25 

89 

77 

68 


* Table D.8. is reproduced from Frank Wilcoxon, Some rapid approximate sta¬ 
tistical procedures , published by American Cyanamid Co., New York, 1949. It is 
here published with the kind permission of the author and his publishers. The 
author states that the table was obtained by rounding off values given by John W. 
Tukey in Memorandum Report 17, The simplest signed rank tests , Statistical 
Research Group, Princeton University, 1949. 


GLOSSARY 


— is equal by definition to 

Cov X y = covariance, that is, E{[X — E(X)][Y — I?(F)]} 

E = expectation or mean value. E[g(x)] = ^ g(x)f(z) in the discrete 

r all x 

case and J R * g(x)f(x) dx in the continuous case. 

E(X) = the arithmetic mean 

m = the arithmetic mean of a sample 

mom c = the kih moment about the point c, that is, E[(X - c) k ] 

M = mu, the arithmetic mean of a population 

= a normal distribution with mean p and variance o 2 or a normal 
population with mean p and variance a 2 , depending upon the 
context 

p-mcc = Pearson product-moment coefficient of correlation, that is, 

Co v xy / VYar* Var y 


<j>(n,o 2 ) = a normal frequency function with mean p and variance <j 2 

r = the sample p-mcc 

p == rho, the population p-mcc 

$ = the sample standard deviation 

s 2 = the sample variance 

cr s= sigma, the population standard deviation 

cr 2 = the population variance 

Vara, variance, that is, E[X — E(X)] 2 
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ANSWERS TO ODD-NUMBERED 
EXERCISES 


CHAPTER 1 


1 . 3 . 1 . b, f. 1 . 3 . 3 . (a) Each screw is a member; its diameter is its value. 
(6) Each performance is a member; its value is satisfactory or unsatisfactory, 
(c) Each blood cell is a member; its value is red or white, (d) Each conversa¬ 
tion is a member; its value is its duration. ( e ) Each tube is a member; its 
value is defective or nondefective. 1 . 7 . 1 . (J-4,0), (J-^,1), (J-ij2). 1 . 7 . 3 . (%,0), 

(H,l). 1 . 7 . 5 . ( 2 H5,0), ( 2 Hs,l), (Hs,2); ( 2 Hs,0), ( 4 Hs,l), (1,2). 1 - 7 . 7 . (a) 

( b) (H- 1), (H,0), (HA); (Hfi), (U)- 

(c) (%, absence), (l{, presence), (d) (H, for), (H, against). 1.8.1. (.24,1), 
(.76,2); (.24,1), (1,2). 1 . 8 . 3 . (% 98 ,1), ( 21 H 96 ,2), ( 27 H 95 ,3); (Ji»s,l), 
( 1 ^ 3 , 2 ), (1,3). 1.8.5. (% 5 ,0), (MbA), («,2); (Ks, same), (% 5 , different). 

1 . 8 . 7 . P“,, ns . nt = n!/ni!n 2 ! • ■ • n k \ 1 . 8 . 9 . (Hie,3), (Hie,4), (Hie,5), 

( l Hie,6),’ (‘Hie,7), ( 2 Hie,8), ( 2 Hie,9), ( 2 Hie,10), ( 2 Hie,H), ( 2 Hie,12), 
( 2 Hie,13), etc. 1.8.11.(3^6 5,0), ( 8 Vl6 5,1), ( 41Q1 '* 
f(x) = Cl°2™-*/3 30 . 


„2), (H65,3)• 1 . 8 . 13 . 


CHAPTER 2 

2.2.1. (a) Random, (b) Violates second condition (independence), (c) 
Random, (d) Random, (e) Violates first condition; for example, a name 
on page 2100 has a greater probability of being drawn than a name on page 
1386, if each of the two pages contains the same number of names. (/) Vio¬ 
lates both conditions. ( g ) Violates first condition. ( h ) Violates both condi¬ 
tions. 2.2.3. Both conditions. A person in a small community has greater 
probability of being included in the sample than a person in a large com¬ 
munity. The second condition is obviously violated. 2.2.5. (a) (1,0). 
(b) (H,--5) (H,0) (H,-5). (c) (.3,0) (.6,.5) (.1,1). (d) (.1,0) (.4,.5) (.3,1) 
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(.4,1), (.6,2). 2.2.11. }(x) = C™C¥L X /C\™. 2.3.1. y 120 . 2.3.3. (a) CfC?/ 
C?°. ( b ) (C?° + CfC 6 6 ° + Cl°C^ 0 )/C 9 7 °. (c) (C 3 4°C 6 3 0 + C! 0 Cf + Cl°C\° + 
Cf)/Cf. (d)C?VC 9 7 °. 2.3.5. (a) 1^04. (&) l Ko 2 . (c) % 4 - (d) 

2.3.7. (a) 7 %56. (&) %56. (c) 2S Hs6- (d) 5 J 64 . 2.3.9. (o) ?< 7 . ( 6 ) J.f. 

(c) %• (d) Hi- (e) %• (/) 2.3.11. (a) y 12 . ( b ) ^ 2 - (c) 

W H- (e) H- if) Vi 2 - (g) H. ( h) %. ( i) %. (j) %. (jfc) j 6 . 2.3.13. 
(o) .31466 .... (b) .68533 .... (c) .48. (d) .17866 .... (e) .048. 

(/) .7066 - (g) .65866 - 2.4.1. (a) ) 6 . (6) %. ( c ) J*. 2.4.3. 

(«) Vi2. ( b ) H. (c) %. (d) M 3 . (e) y 2i . (/) % 8 . ( ff ) 2^ 92 . %2 . 

2.4.5. (^o) 5 . 2.4.7. .21. 2.4.9. (a) J* 0 . (6) % 5 . 


CHAPTER 3 



0 

1 

2 

H e 

1 

0 

0 

Hr 


H 

0 

Hi 

'A 

H 

It 

h 3 

0 

H 

H 

Hr 

0 

0 

1 


3.1.3. (a) f(x) = CjCSL/CT. (6) fix) = (c) /(*) = CfC\_J 

C 4- Unless the ratings are independent, the second condition for random 
sampling will be violated even though the candidates themselves are chosen at 
random. 


3.1.5. 



0 

1 

2 

3 

Ho 

1 

0 

0 

0 

Hr 

H 

y 3 

0 

0 

Ho 

Vl2 

H 

Vl 2 

0 

Ho 

Hi 


Hi 

H 4 

Hr 

54 2 

X* 

Hu 

Hi 

H t 

Hi 

H.4 

Hu 







etc 
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3.1.7. (a) fix) = CTCZJCl* 0 . (b) fix) = ic) fix) = 

C 1 , 000 C 1 , 000 / C 2 , 0 ° 0 _ (d) f( x ) = 3.2.1. id) Reject at .01 level of 

confidence. (6) Reject at .02 level of confidence, (c) Do not reject ip = 
.20). id) Do not reject (p = 1); do not accept, either. 3.2.3. (a) Reject at 
.0004 level of confidence. (6) Reject at .0009 level of confidence, (c) Reject 
at .01 level, id) Reject at .04 level. 3.2.5. Reject at .015625 level. 
3.3.1. 2. 3.4.1. See Table 3.5.3. 3.7.1. .024. 3.7.3. Reject at .022 level. 
3.7.5. Borderline; the p value is .054. 3.7.7. Reject at .05 level. 3.7.9. 
Reject at .02 level. 


C™C\llJCir. ic) fix) 


CHAPTER 4 

4.3.1. (a) 1. ib) 2. ic) id) 1.95. (e) 1.40. 4.3.3. (a) for. (6) 

for. 4 . 3 . 5 . (a) 1;.5. (6)1.5; .75. ic)}i;%. id). 6;%- (e) H i; 25 Ho5- 
4.3.7. (a) .4; .09. (6) .8; .21. 4.3.9. He 0; 0. He. .5; .25. He 1; .4286. 
H z \ 1.5; .5356. He 2; .5720. He 2.5; .5356. H 6 : 3; .4286. He- 3.5; .25. 

H 8 : 4; 0. The mean of the number of orange chips in a sample is in each case 
four times the mean number of orange chips hypothesized. If we take as a 
statistic not the number of orange chips but the mean number, all the above 
means are divided by four; that is, the mean of the mean number of orange 
chips in a sample is in each case the mean number of orange chips hypothesized. 
4.3.11. H 0 :0;0. He 2.5; 1.87. H 2 : 5; 2.50. He 7.5; 1.88. H 4 :10;0. If we 
take as statistic the mean number of orange chips in a sequence of 10 draws, 
each of the means is divided by 10, and is equal in each case to the mean num- 

h m » 

ber hypothesized. 4.4.1. (a) V xlfixf). (6) V (* — i) 2 /(f). ic) ^ Vj- 

i =1 7=1 J=6 

& 15 40 25 

(d) ^ (i - c)\ (e) 2 (i + a) i+2 . (/) [] x - 4 * 4 * 3 ’ W X 


i = 2 
30 

X' 0/^30 
/ W ^30- 

i = 15 


70/-Y30 m 100 
i 2ft_i/ C- 30 • 


I cf(K) 3 °. 

i = 10 

id) .001415. 


(c) £ CfiH) 30 . 4.7.1. (a) .15. (6) No median, (c) .1619. 

(e) .038° 4.7.3. E(X 3 ) - 3 EiX)EiX>) + 2 [E(X)}\ 


CHAPTER 5 


5.4.1. (a) Hypergeo metric; p = ^ xfix); <r 2 — ^ (x — pYfix) with 

x=0 x=0 

50 50 

fix) = CfCl^'/Cir ib) Binomial; p = Y xfix)-,a 2 = £ (* - m) 2 /M 
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50 50 
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t>u 50 

with/(a;) = Cl 0 (K)*(y 6 y°-*. ( c ) Binomial;/* = £ xf(x); <r 2 = j (* _ nYf(x) 

®=o at = 0 

50 50 

with /(#) = ^(HD^lD 60- *. (d) Binomial; \x = ^ £/(#); <r 2 = Y (# _ 

^=0 x—O 

100 

/*) 2 /(a) with /(a:) = Cl^He^He) 60 -*. (e) Binomial; p = ^ »/(»); a 2 = 

100 

^ 10 
2/ ^ ” m) 2 /W with/(a:) = Ci 00 0^) 100 * (/) Hypergeometric;/i = ^ ^/(rr); 

K *M 0 

= 2, (x - fi) 2 f(x) with/(a:) = CfCll^JCll (g) Binomial; p = J x f(x) ; 

^50° * = 0 

= 2 (* - /*) 2 /(z) with /(a:) = Cl\H 2 y{sy 32 y«-z. (h) Binomial; u = 

at = 0 

1 

^ xf(x); <r 2 = ^ (x - p) 2 f(x) with f(x) = Cj(27% 37 )*(36% 37 )i-* (*•) 

; — 0 a: = 0 

1 1 

Binomial;/* = ^ xf(x);<r 2 = ^ (x — /i) 2 /(:r) with/(a;) = C'i( 3 % 1 )*( 5 ^ 1 )i-*, 

a; = 0 at = 0 

O') Neither. 5.4.3. p = p; o- 2 = pq. 5.4.5. The variance of the number of 
. s 18 n V1- As the proportion of A’s is the number divided by the sample 
size, the variance of the proportion is npg/n 2 = pq/n. 5.5.1. (a) f(x) = 

C 5 X °(W X (%) S °- X . (b) f{x) = CliNJNYiiN - A0)/jV]»-*. 5.5.3. ^ 

^V(l-p) 26 -. x=0 


x — 0 


5.6.1. 


X 

Observed 

Binomial 

6 

.005 

.000 

5 

.005 

.000 

4 

.010 

.006 

3 

.035 

.043 

2 

.155 

.180 

1 

.405 

.400 

0 

.385 

.371 


CHAPTER 6 

2 

6.1.1. (a) f(x) = e~ x /x\. (b) f(x) = (np) x e~ n *>/x\. 6.5.1. ^ 10 x e~ 10 /xl. 


x — 0 


6.5.3. .82. 
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CHAPTER 8 


8.3.1. (a) Hos. ( b ) *> 08 . (c) 0. (d) *%os- («) 7 «o8. (/) 1Ss Aos- 

8.3.3. (a) /(*) = 1 - x/2. ( 6 ) /(1) = H- M *(*) = * - *74. W Me- 
(e) .4225. 8.3.5. (a) /(*) = *7 2 - H- (ft) F(*) = f{ 6 ~ 

8.3.7. (a) l/x. ( 6) .12. 8.3.9. (a) -3x*/U + x. (ft) * /14 + *-/ 2 

6/14. 8.5.1. (a) 2. (6) .61. 8.5.3. (a) 2 log * - 3. (6) e-. (c) .30. 
8.5.5. a, c, d, e, /, g, j, k. 8.7.1. Exercise 8.3.1: .421; .232. Excise 8.3.2: 
U- Mj, Exercise 8.3.3: %; %■ Exercise 8.3.4: 2; Exercise 8.3.5. /g, 
203 ^ 880 . Exercise 8.3.6: Exercise 8.3.7: 1.718; .24. Exercise 

8.3.8: .386; .04. Exercise 8.3.9: 25 K68i 1146 K4ii2o- 8.7.3. (a) . (J . 


CHAPTER 9 


9 . 2 . 1 . (a) Cannot. (6) 0; 1/vV- («) 3 i V3\/2ir. (d) -3; l/W^- 

(,) 0, 1/V2. (/> 3. 1/VS. 9.2.3. /,«'"< 1/2Vis*-"'*-'-- 0 ’*. 
9.2.5. f 150 ' 5 ( l / 6\/^) e ~ anmix ^ 10a)1 dx - 9 - 4 - 1 - i / Sv 72 ®-. ( ft ) -1; V 2 * 

9.4.3. (a) 1.000. (6) .68. 9.4.5. (a) .117. (ft) .629. 9.4.7. 1.00. 9.4.9. 

.954. 9.6.1. (a) .383. (ft) .533. (c) .309. 9.6.3. (a) Below .6 or above 59.4. 

(6) Below 5.33. (c) Above 124.68. 9.5.5. <Hm* + Mi,, 13). 9-J-T- 7.8 to 

12.8. 9.5.9. (a) .265, .001, .000. ( 6 ) .265, .378, .000. (c) L 000, .000, .164. 

9 5 11 166. 9.7.1. (a) Cannot reject. (6) 49.16 to 50.84. 9.7.3. T ey 
would' not be independent; therefore, one of the assumptions of random 
sampling would be violated. Even if we define the population sampled as the 
population of this one subject’s reaction times, it would be difficult to devise an 
experimental procedure justifying the assumption of independence. . 

The 36 scores are not independent, as two were obtained from eachpatient. 
9.7.7. (a) Reject at the .01 level of confidence. ( 6 ) .52 to . 76 . 9.7.9. As 

the proportion is unknown, we maximize the error estimate by taking P - • , 
then n> 4,144. 9.8.1. (a) .29. (6) .14. 9.8.3. (a) Reject at the .00 

level of confidence. (6) -13.516 to -7.484. 9 8 5 (a) p > .13. (ft) 

v > 31 ( C ) -.21 to .03. (d) .27 to .45. 9.8.7. (a) Yes, as far as the 

statistical test is concerned. ( 6 ) .000277 and .000545. (c) That the 225 

products produced by each model have been produced m such a way that there 
is no trend in the magnitudes of errors from the first member of the sample to 
the last. Also, as only one model of each is built, it is assumed that any 
differences in precision between different models of the same design would be 
negligible in comparison with differences between designs. 9.8.9. (a) .74 to 
85 (b) That whatever factors are responsible for volunteering are unrela e 

to 'susceptibility to the vaccine. 9.8.11. The test based on the mean and 
variance of the 100 differences, yielding a critical ratio of 7.5. This test 
based upon the central-limit theorem, not Theorem 9.8.2. The population 
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is the class of pairs {B — A for subject X, subject X}. The other test 
assumes the two samples to be drawn independently. 


CHAPTER 10 

10.2.1. By combining the three extreme class intervals in each tail of the 

distribution, a xt of 13.8 is obtained; thus the hypothesis cannot be rejected. 
10.2.3. xt — 6.86; thus the hypothesis cannot be rejected. 10.2.5. Com¬ 
bining 0 and 1 we obtain xt — 9.19, just failing of significance at the .05 level. 
10.2.7. xt — 20.65; reject at the .005 level of confidence. Note that the 
largest discrepancy between theoretical and observed frequencies occurs for 
the class of values 4, 5, ... . 10.2.9. (a) No, as xi = 4 is significant at the 

.05 level. ( b ) That the 100 packages were sold independently, that is, that 
the sale of one package had no effect on the sale of another package, as far as 
color is concerned, and that no selective factor entered in which tended to 
make these 100 customers systematically different from customers of this 
store in general (or customers of other stores, if a generalization to the cus¬ 
tomers of other stores is to be made). 10.3.1. (a) xt = 27; reject at the .005 
level, (b) xt = 275; reject at the .005 level. 10.3.3. xt — 16; reject at the 
.03 level. 10.3.5. xt — 16; reject at the .03 level. 10.4.1. (a) xt — *4; do 
not reject. ( b) xt = 7.3; .10 > p > .05. (c) xi = 7.4; reject at the .03 
level. 10.4.3. xt ~ 10.6; do not reject. 10.4.5. About 3,160. 10.5.1. 
X 12 = 15; do not reject. 10.5.3. The entries in the table are not frequencies. 
There is no way of modifying this table so as to justify a chi-square test of 
independence, as there is no way of obtaining frequencies from sums of scores. 
10.5.5. xt ~ 13.7; reject at the .03 level. Cells containing large theoretical 
frequencies contribute relatively little to the value of chi square, as the squared 
discrepancies are the same as for the low-frequency cells. 10 . 6 . 1 . (a) xt — 10; 
the hypothesis cannot be rejected, (b) 2.63 to 18.52. 

CHAPTER 11 

11.1.1. t 2Q = 3.0 ; thus the hypothesis can be rejected at the .01 level of 
confidence. Although this test is valid in that the probability of a type I 
error is given by the t tables, it would ordinarily have a very low power because 
ordinarily the parameters which would tend to make the critical ratio large 
would also tend to make chi square large. 11.2.1. t 9 = 2 . 33 ; reject at the .05 
level of confidence. The test assumes that the 10 differences can be con¬ 
sidered a random sample from a normally distributed population. As the 
samples cannot be considered independent, only one t test can be made, 
11.2 .3. 17 — 3.5; reject at the .01 level of confidence. The test assumes that 
the two populations of mileages are normal and have equal variance. The 
data could be paired at random, discarding one observation, and h computed. 
The .05 confidence interval for the first make is 15.5 to 24.5. 
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12.1.1. (a) 


CHAPTER 12 


.05 

.15 

0 

0 

0 

.30 

.30 

0 

0 

0 

.15 

.05 

0 

1 

2 

3 


Red 


( b ) Red: (.05,0), (.45,1), (.45,2), (.05,3). White: (.20,0), (.60,1), (.20,2). 

(c) -.30. 12.1.3. (a) H. (&) (c) ^ 92 ; 9 % 76 - (d) % 4 . 12.2.1. 

(a) (3x 2 + l)/2; (3^/ 2 + l)/2. ( b ) 3(x 2 + 2/ 2 )/(3^ 2 + 1). (c) (6z 2 + 3)/ 
(12a; 2 + 4). 12.2.3. (a) (5y + 1)/18. ( b ) (x + 3 xy + y)/(I2x + 3). (c) 
(18^ -f- 5)/(12a? ~f" 3). 12.3.1. wi y \o = 4.40, wi>y\i — 4.17, ^y \2 — 3.57, No. 

12.3.3. m y \ x = e — 1. The regression is linear, but as X and Y are independ¬ 
ent the regression line is parallel to the x axis. 12.3.5. nsl +y = 2 (a; + y — 
m x+y ) 2 = 2(3 + y - m x - m y ) 2 = 2[(® - m x ) + (y - m y )] 2 = 2(s - m x ) 2 
+ 2(2/ — m^) 2 + 22(a: — m x )(y — m y ) = nsl + nsl + 212.4.1. p = 0; 
r) yx = .452. Lack of linearity of regression accounts for the discrepancy. 

12.4.3. No, for example in Exercise 12.4.1, rj xy — .368. 12.7.1. (a) Yes, a 

scatter plot shows that the regression of Y on X is approximately linear. 
(, b ) Grouping into intervals of length 5, with a multiples of 5 as mid-points, 
yields r = .70. (c) y P = .594a: + 27.8. (d) x p = .8152/ - .7. (e) 53.3. 

(/) Combining extreme class intervals and testing the hypothesis that the 
marginal distribution of X is normal, we obtain x? = 2.91, a quite good fit. 
Similarly, for the marginal distribution of F, xt — 1.18, an unusually close fit. 
As each marginal sample distribution is approximately normal and as the 
regression is linear, the assumption that the bivariate distribution is normal is 
reasonable; therefore, we can test hypotheses about p. (g) Reject at the .001 
level, (h) .62 to .77. (i) Zero. Within the interval —4.6 to 4.6. In the 

sample, errors are slightly smaller for the intervals x — 40 and x = 45, but if 
the bivariate distribution is normal the errors will not vary systematically 
with x. (j) Zero; 9.5. (k) .07, assuming that errors are distributed normally 

about the prediction line and that our estimate of the standard error is accu¬ 
rate. 12.7.3. Higher for the heterogeneous group. As is constant, 
p 2 ( = 1 — crl/al) is higher the larger the value of cr y . As p 0 and the regres¬ 
sion is linear, is larger the greater the range of X. 12.7.5. Psychologist B. 
For example, even if a maze has zero reliability, the probability of being able 
to reject, at the p level of confidence, the hypothesis that mean performance 
scores of two groups of animals differ only by random sampling is p, if the 
usual assumptions for the test are satisfied. Low reliabilities of measuring 
instruments increase the probabilities of type II errors but do not affect the 
probabilities of type I errors. 
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CHAPTER 13 

13.4.1. (a) Fq t iQ = 1.31 ( not 1.49). (6) Yes. (c) No, unless we wish to 

make this test of randomness of sampling; we already know the ratio of the two 
parameters. 13.4.3. (a) F 24f4 8 = 2.72; reject at the 2(.005) = .01 level of 

confidence. ( b ) 7^24,48 — 5.44; reject at the .01 level of confidence, (c) 
7^24,48 = 2.72; reject at the .005 level of confidence (by interpolation in the F 
tables). 13.4.5. Random samples can be taken of the products produced by 
the two methods and the F ratio nis?(n 2 - 1)/Wf(^i — 1) computed. It is 
assumed that each of the two distributions sampled is normal. The assump¬ 
tion of randomness of samples would in some cases be difficult to satisfy. 
13.5.1. (a) F i t i6 = 2.13. (b) Not significant, (c) t u = 1.46 = \/Fi iU . (d) As 
we sampled the same population each time, our only conclusion would be that 
our method of sampling was not random or that a rare event had occurred. 
13.5.3. It would be assumed that the cephalic indices in each group (popula¬ 
tion) were normally distributed with the same variance, and that the two 
samples could be considered random. Then F could be used. 13.6.1. (a) 
7^5,9 = 2.39. ( b ) Yes. (c) That our method of sampling the population is 

not random (or that a rare event has occurred). 13.6.3. H = 12.9; reject the 
hypothesis of homogeneity of variance at the .025 level. The F test cannot be 
used. The population sampled in each case is, strictly speaking, the increases 
or decreases during this period for all the stocks which would have been selected 
by the particular method. Obviously, this is not the population about which 
the investor would most like to draw an inference; it is the gains or losses from 
the future use of the method which are most important. As in many cases, it 
is impossible to secure a random sample from the population one is most 
interested in; the assumption that populations in the future will have about 
the same parameters as ones in the past may in this and other situations be 
fairly plausible, but there is no way of drawing a purely statistical inference 
about populations whose members are in the future. 13.9.1. (a) F SiU = 7.17. 
Using a one-tailed test, reject at the .01 level of confidence. The partition 
into varieties cannot be considered equivalent (statistically) to an arbitrary 
partition of samples drawn from a normal population, nor can the four columns 
of data be considered as four samples drawn at random from four populations 
having the same distributions. By assuming that the four populations have 
the same variance, we can infer that their means differ. In making this test, 
we have eliminated variation attributable to row (block) differences; the 
populations sampled can be considered as having been “corrected’" by adjust¬ 
ing the value of each member according to the block it lies in, so that all blocks 
would have the same (estimated) mean. Strictly speaking, our conclusion 
concerning variety differences is valid only for t}ie five blocks we selected 
randomly; in practice, there might be no reason for doubting that similar 
results would hold for the large area from which we selected the blocks at 
random. If there were only two varieties, a significant F would enable us to 
generalize to the whole area, (b) F 4| i 2 = 22.71. Using a one-tailed test, 
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reject at the .005 level. (A two-tailed test would also permit rejection at the 
.005 level.) The same remarks made in a apply, except that in this case it is 
obvious that we should limit our generalization about block differences to the 
four varieties (which were not selected randomly); of course, there may be some 
reason in terms of nutrition theory to generalize to other varieties, (c) 
^3,16 — 1.16. The enormous block differences make the value of F low, show¬ 
ing the advantage of the double partition design, (d) F would be difficult to 
interpret if the interaction estimate were used, as the methods are not ran¬ 
domly selected; the difficulties involved and the proper method of interpreta¬ 
tion are beyond the scope of this text. 

CHAPTER 14 

14.7.1. 473. 14.7.3. Yes; p = .05. 14.7.5. Passes all three tests. 



INDEX 


Alternating harmonic series, 221 
Analysis of variance, complex, 194— 
201 

computation of, 201-202 
degrees of freedom in, 193, 195, 
199-201 

simple, 183-194 
Arbitrary origin, 60 
Arithmetic mean (see Mean) 
Average, 50-52 

Bernoulli’s theorem, 40 
Biased estimate, 11 On. 

Binomial distributions, definition of, 
67 

derivation of, 65-67 
fitted to sample data, 73-74 
as limit of hypergeometric, 70-72 
moments of, 67-69 
normal approximation to, 96-98, 
103-104 

Poisson approximation to, 75-77 
Binomial expansion, 67 
Bivariate distributions, 152-179 
definition of, 152 

marginal distributions of, 155-156 
moments of, 156 
normal, 170-175, 252-254 
ellipses of, 170-171, 254 
homoscedasticity of, 253 

Central-limit theorem, 110 

in statistical inference, 110-115 


Central tendency, measures of, 50-52 
Change of variable, 236-237 
Chi square, 123-142 
definition of, 123, 249 
degrees of freedom for, 123, 131, 
135-136, 139 

proof of properties of, 249-250 
sum of two, 123, 250 
table for, 280 

tests, of goodness of fit, 124-133 
of homogeneity, 138-139 
of independence, 133-137 
Combinations, 14 
Computation, of a Poisson, 78 
of r, 175-176 

of sample moments, 57-63 
of sums of squares, 201-202 
of variance, 53 

Conditional distributions, 157-162 
Conditional probability, 25-26, 157- 
159 

Confidence interval, 36-37 

for difference, between means, 117- 
118, 147-148 
between proportions, 119 
for mean, 107-109, 113-114, 145 
with one bound only, 47-49 
for percentile point, 208-209 
for proportion, 114-115 
for variance, 141-142 
Confidence level (see Level of con¬ 
fidence) 

Consistent estimate, 110^., 244 
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Contingency table, tests of inde¬ 
pendence in, 133-137 
Continuous distributions, definition 
of, 88 
model, 94 
moments of, 95 

Continuous functions, 222-223 
Convergence, of a series, 220 
stochastic, 244 

Correlation, product-moment, 163- 
167, 170-176 

Correlation ratio, 168-169 
Courant, R., 213, 222 
Covariance, 162-166 
Cram&r, H., 213 
Critical ratio, 110-113 
Cumulative distribution functions, of 
continuous populations, 85, 232- 
233 

of finite populations, 6-7 

Definite integral, 90-92, 223-224 
Degrees of freedom, in analysis of 
variance, 193, 195, 199-201 
for chi square, 123, 131, 135-136, 
139 

for t, 144, 200-201 
Density function, 88-94, 232-233 
Derivative, definition of, 86, 224-225 
in finding maxima and minima, 
227-230, 252 
interpretation of, 227 
as proportion or probability den¬ 
sity, 86, 88, 232-233 
second, 226-227, 230 
as slope, 227-228 
table of, 274 
theorems, 226 
as velocity, 227 
Descriptive statistics, 1 
Dichotomous population, 113, 118 
Difference, among several means, 
185-201 


Difference, between two independent 
normally distributed variates, 
116, 248-249 

between two means, 116-118, 146- 
149, 183-184 

between two proportions, 118-119 
Discrete distributions, definition of, 
82 

logic of inference for, 83 
Distribution-free methods, 204-210 
Distribution functions (see Cumula¬ 
tive distribution functions; Fre¬ 
quency functions) 

Distributions, conditional, 157-162 
continuous, 88 
discrete, 82-83 
finite, 7 

graphic methods of showing, 8 
joint, 240-242 
probability, 22 

(See also other specific types of 
distributions) 

Dixon, W. J., 213, 284 

Edwards, A. L., 200, 214 
Eisenhart, C., 285 
Ellipses of bivariate normal dis¬ 
tributions, 170-171, 254 
Error, types of, 45-47 
(See also Standard error) 

Estimate, biased, llOn. 
consistent, 110n., 244 
error of, 167, 173-175 
of a percentile, 209 
of a proportion, 118-119 
unbiased, 110, 146, 181-183, 185— 
186, 243, 245 

of variance, 110-112, 146, 181— 

183, 185-186, 190, 193-197 
Eta, 168-169 

Expectation, mathematical, 51-52, 
233-234, 241-242 
of sample moments about origin, 
242-243 

of sample variance, 245 



INDEX 


301 


Expected sampling distribution (see 
Sampling distribution) 

F, definition of, 180 

distribution of, 180-181, 251 
table for, 282-283 
Factorial, 13 
Feller, W., 213 

Finite population defined, 3-4 
Fischer, R. A., 131, 143, 200, 213, 
273, 281 

Fitting to sample data, of binomial, 
73-74 

of normal, 121-122, 131 
of Poisson, 79, 131-132 
Frequency functions, continuous, 88, 
232-233 
discrete, 82 
finite, 5 
model, 94 

Frequency polygon, 8 
Frequency table, 58 
Fry, T. C., 213 

Fundamental theorem of the calculus, 
93, 231-232 


Gamma function, 123, 144, 180, 249 
Gaussian distributions (see Normal 
distributions) 

Geometric mean, 52 
Goodness-of-fit tests, 124-133 
Gosset, W. S., 143 
Graphic methods of showing dis¬ 
tributions, 8 

Grouping into class intervals, 61-63 
Groups, sampling of several, 185-190 
Guilford, J. P., 214 

Harmonic series, 220-221 
alternating, 221 
Histogram, 8 
Hoel, P. G., 213 


Homogeneity, tests of, 138-139 
of variance, 147, 185-186 
Homoscedasticity, 175 
of normal bivariate distributions, 
253 

Hypergeometric distributions, bino¬ 
mial approximation to, 70-72 
definition of, 64 
moments of, 64-65 
Hypothesis testing, logic of, 41-43, 83 

Indefinite integral, 92-93, 231 
Independence, 27, 159, 241-242 
tests of, 133-137 

Independent variables, 116, 240-242 
Inference, statistical, 1-2 
logic of, 41-43, 83 
Infinite sequence, 219-220 
Infinite series, 220-222 
Inflection, point of, 230 
Integrals, definite, 90-92, 223-224 
indefinite, 92-93, 231 
multiple, 237-240 
table of, 275 
Interaction, 195-201 
Intervals, class, 61-63 

Joint distributions, 240-242 

Law of large numbers, 243-244 
Level, of confidence, 35-37, 46-48 
in using F, 181, 186-187 
of significance (see confidence, 
above) 

Limit, of a sequence, 219-220 
of a series, 220-222 
Lindquist, E. F., 200, 214 
Linear regression, 162-167, 252 
Linear transformation, in computa¬ 
tion of r, 176 

of normal distributions, 102-103, 

248 

(See also Transformations) 
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Logarithms, common, table of, 269- 
270 

natural, table of, 271 
Logic of statistical inference, 41-43,83 


McNemar, Q., 200, 214 
Marginal distributions, 155-156, 
174-175, 240-241 
Massey, F. J., Jr., 213 
Matched-pairs signed-ranks test, 209- 
210 

table for, 288 
Mean, 51-52 

of conditional distributions, 159- 
161 

confidence interval for, 107-109, 
113-114, 145 

of continuous distributions, 95 
of finite distributions, 51-52 
sampling distribution of, 17-18, 
110-112 

of several groups, 185-201 
standard error of, 112 
of subsample, 151 
of sum of independent random 
variables, 106-107, 243 
of two groups, 181-184 
(See also Expectation) 

Median, confidence interval for, 209 
of finite population, 51 
Merrington, M., 283 
Mode, of chi-square distribution, 124 
of finite distribution, 50 
of normal distribution, 99 
of t distribution, 144 
Moment-generating functions, of chi- 
square distributions, 249 
definition of, 235-236 
of normal distributions, 247 
Moments, computation of, 57-63 
definition of, 57, 95, 156, 235 
(See also Expectation) 

Mood, A. M., 200, 213, 284 
Multiple integral, 237-240 


Nonparametric methods, 204-210 
Normal bivariate distributions, 170- 
175 

proof of properties of, 252-254 
Normal distributions, as approxima¬ 
tion to binomial, 96-98, 103-104 
bivariate, 170-175, 252-254 
cumulative, table of, 279 
definition of, 98-99 
density function of, table of, 278 
fitted to sample data, 121-122, 
127-128, 130-131 
linear transformation of, 102-103, 
248 

mean of, 100, 247 
mode of, 99 

moment-generating function of, 

247 

moments of, 100-101, 247 
proof of properties of, 245-249 
use of, in finding confidence in¬ 
tervals, 107-109, 116-117 
in hypothesis testing, 105-107, 
116-117 

variance of, 101, 247 
with zero mean, unit variance, 
102-104 


One-tailed vs. two-tailed tests, 47- 
49, 106, 184, 210 
in using F, 181 

Operator, linear, 52, 95, 234, 240, 242 
Order statistics, 207-208 
Ordered pairs, class of, 4 

Paired t test (see t) 

Parameter, 47-48, 50, 83, 204 
Partition of a sample, 191-202 
Pearson, Karl, 125 

Percentile points, confidence intervals 
for, 208-209 
estimates of, 209 
Permutations, 13 
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Peters, C. C., 213 
Poisson distributions, computation 
of, 78 

definition of, 78 
as exact, 80-81 
fitted to sample, 79, 131-132 
as limit of binomial, 75-77 
moments of, 78-79 
Population, continuous, 84 
discrete, 82 
finite, 3-4 

Power of a test, 37-40, 106 
Power series, 234 
Prediction equation, 166-167 
Primitive functions, 88, 93, 230-231 
table of, 275 

Probability, calculus, 25-28 
conditional, 25-26, 157-159 
definition of, 21-22 
Probability density, 86, 88, 157-159, 
232-233 

Probability distribution, 22 
(See also Probability density) 
Product moment, 162-163 
Product-moment correlation coeffi¬ 
cient, assumption of linearity in, 
163-166, 168-169 
computation of, 175-176 
confidence interval for, 171-173 
definition of, 163 
in regression equation, 166-167 
sampling distribution of, 170-171 
testing hypotheses about, 171-173 
transformation to z, 171-173 
table of, 273 

Proportion density, 86, 88 
Proportions, confidence intervals for, 
114-115, 119 

difference between, 118-119 
testing hypotheses about, 113, 118 


Random sampling, 18-20, 41, 241 
Random variables, 51, 108, 240-243 


Range, 53 

Ranks, values as, 51 
Regression, 157-167 
linear, 162-167, 252 
Rider, P., 200, 214 
Robbins, II., 213 
Run test, 205-206 
table for, 285-287 


Sample, 17 
size of, 112-113, 143 
Sampling distribution, definition of, 
17-18 

nonparametric, 204 
on various hypotheses, 30-31, 33, 
38-41 

Scatter plot, 173-174 
Sets, 2 

Sign test, 204-205 
table for, 284 

Significance (see Level of confidence) 
Size of sample, 112-113, 143 
Small samples, 143, 146 
Snedecor, G. W., 200, 214 
Squares and square roots, table of, 
256-268 

Standard deviation, of a finite dis¬ 
tribution, 53 

of a normal distribution, 101, 103, 
105, 112 

(See also Variance) 

Standard error, of a difference, 117 
of estimate, 167, 173-175 
of mean, 112 
Statistic, 17, 50 
Stochastic convergence, 244 
“ Student,’ 7 143 
Subsample mean, test for, 151 
Sum, of normal variates, 106-107 
of random variables, 242-244 
of squares, computation of, 192- 
196 

Swed, F., 285 
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t , definition of, 143, 250 
degrees of freedom for, 144, 200- 
201 

distribution of, 143-145, 250-251 
mode of, 144 
after F, 186-187 
paired, 147-149, 198-201 
table of, 281 
tests, 145-151, 198-201 
Taylor, Brook, 234 
Taylor series, 234 
TchebyshefFs inequality, 244-245 
Thompson, C. M., 280, 283 
Tolerance limits, 206-207 
Transformations, for computational 
purposes, 59-60, 176 
of normal distributions, 102-103, 
121-122, 248 
r to z, 171-172 
table of, 273 

(See also Linear transformation) 
Trigonometric functions, table of, 272 
Tukey, John W., 187, 288 
Two-tailed tests, 35, 105-106, 110— 
111, 181, 184 
Type I error, 45-47 
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